Is Airworthiness Dead? 2/

Where I left the discussion there was a question mark. What does conformity mean when constant change is part of the way an aircraft system works?

It’s reasonable to say – that’s nothing new. Every time, I boot up this computer it will go through a series of states that can be different from any that it has been through before. Cumulative operating system updates are regularly installed. I depend on the configuration management practices of the Original Equipment Manufacturer (OEM). That’s the way it is with aviation too. The more safety critical the aircraft system the more rigorous the configuration management processes.

Here comes the – yes, but. Classical complex systems are open to verification and validation. They can be decomposed and reconstructed and shown to be in conformance with a specification.

Now, we are going beyond that situation where levels of complexity prohibit deconstruction. Often, we are stuck with viewing a system as a “black box[1]. This is because the internal workings of a system are opaque or “black.” This abstraction is not new. The treatment of engineered systems as black boxes dates from the 1960s. However, this has not been the approach used for safety critical systems. Conformity to an approved design remains at the core of our current safety processes. 

It’s as well to take an example to illustrate where a change in thinking is needed. In many ways the automotive industry is already wrestling with these issues. Hands free motoring means that a car takes over from a driver and act as a driver does. A vehicle may be semi or fully autonomous. Vehicles use image processing technologies that take vast amounts of data from multiple sensors and mix it up in a “black box” to arrive at the control outputs needed to safely drive.

Neural networking or heuristic algorithms may be the tools used to make sense of a vast amount of constantly changing real world data. The machine learns as it goes. As technology advances, particularly in machine learning ability, it becomes harder and harder to say that a vehicle system will always conform to an understandable set of rules. Although my example is automotive the same challenges are faced by aviation.

There’s a tendance to see such issues as over the horizon. They are not. Whereas the research, design and development communities are up to speed there are large parts of the aviation community that are not ready for a step beyond inspection and conformity checking in the time honoured way.

Yes, Airworthiness is alive and kicking. As a subject, it now must head into unfamiliar territory. Assumptions held and reinforced over decades must be revisited. Checking conformity to an approved design may no longer be sufficient to assure safety.

There are more questions than answers but a lot of smart people seeking answers.

POST 1: Explainability is going to be one of the answers – I’m sure. Explained: How to tell if artificial intelligence is working the way we want it to | MIT News | Massachusetts Institute of Technology

POST 2: Legislation, known as the Artificial Intelligence Act ‘Risks posed by AI are real’: EU moves to beat the algorithms that ruin lives | Artificial intelligence (AI) | The Guardian

POST 3: The world of the smart phone and the cockpit are here How HUE Shaped the Groundbreaking Honeywell Anthem Cockpit


[1] In science, computing, and engineering, a black box is a device, system, or object which produces useful information without revealing information about its internal workings.

Is Airworthiness dead?

Now, there’s a provocative proposition. Is Airworthiness dead? How you answer may depend somewhat on what you take to be the definition of airworthiness.

I think the place to start is the internationally agreed definition in the ICAO Annexes[1] and associated manuals[2]. Here “Airworthy” is defined as: The status of an aircraft, engine, propeller or part when it conforms to its approved design and is in a condition for safe operation.

Right away we start with a two-part definition. There’s a need for conformity and safety. Some might say that they are one and the same. That is, that conformity with an approved design equals safety. That statement always makes me uneasy given that, however hard we work, we know approved designs are not perfect, and can’t be perfect.

The connection between airworthiness and safety seems obvious. An aircraft deemed unsafe is unlikely to be considered airworthy. However, the caveat there is that centred around the degree of safety. Say, an aircraft maybe considered airworthy enough to make a ferry flight but not to carry passengers on that flight. Safety, that freedom from danger is a particular level of freedom.

At one end is that which is thought to be absolutely safe, and at the other end is a boundary beyond which an aircraft is unsafe. When evaluating what is designated as “unsafe” a whole set of detailed criteria are called into action[3].

Dictionaries often give a simpler definition of airworthiness as “fit to fly.” This is a common definition that is comforting and explainable. Anyone might ask: is a vehicle fit to make a journey through air or across sea[4] or land[5]? That is “fit” in the sense of providing an acceptable means of travel. Acceptable in terms of risk to the vehicle, and any person or cargo travelling or 3rd parties on route. In fact, “worthiness” itself is a question of suitability.

My provocative proposition isn’t aimed at the fundamental need for safety. The part of Airworthiness meaning in a condition for safe operation is universal and indisputable. The part that needs exploring is the part that equates of safety and conformity.

A great deal of my engineering career has been accepting the importance of configuration management[6]. Always ensuring that the intended configuration of systems, equipment or components is exactly what is need for a given activity or situation. Significant resources can be expended ensuing that the given configuration meets a defined specification.

The assumption has always been that once a marker has been set down and proven, then repeating a process will produce a good (safe) outcome. Reproducibility becomes fundamental. When dealing with physical products this works well. It’s the foundation of approved designs.

But what happens when the function and characteristics of a product change as it is used? For example, an expert system learns from experience. On day one, a given set of inputs may produce predicable outputs. On day one hundred, when subject to the same stimulus those outputs may have changed significantly. No longer do we experience steadfast repeatable.

So, what does conformity mean in such situations? There’s the crux of the matter.


[1] ICAO Annex 8, Airworthiness of Aircraft. ISBN 978-92-9231-518-4

[2] ICAO Doc 9760, Airworthiness Manual. ISBN 978-92-9265-135-0

[3] https://www.ecfr.gov/current/title-14/chapter-I/subchapter-C/part-39

[4] Seaworthiness: the fact that a ship is in a good enough condition to travel safely on the sea.

[5] Roadworthy: (of a vehicle) in good enough condition to be driven without danger.

[6] https://www.apm.org.uk/resources/what-is-project-management/what-is-configuration-management/

Safety Research

I’ve always found Patrick Hudson’s[1] graphic, that maps safety improvements to factors, like technology, systems, and culture an engaging summary. Unfortunately, it’s wrong or at least that’s my experience. I mean not wholly wrong but the reality of achieving safety performance improvement doesn’t look like this graph. Figure 1[2].

Yes, aviation safety improvement has been as story of continuous improvement, at least if the numbers are aggregated. Yes, a great number of the earlier improvements (1950s-70s) were made by what might be called hard technology improvements. Technical requirements mandated systems and equipment that had to meet higher performance specifications.

For the last two decades, the growth in support for safety management, and the use of risk assessment has made a considerable contribution to aviation safety. Now, safety culture is seen as part of a safety management system. It’s undeniably important[3].

My argument is that aviation’s complex mix of technology, systems, and culture is not of one superseding the other. This is particularly relevant in respect of safety research. Looking at Figure 1, it could be concluded that there’s not much to be gained by spending on technological solutions to problems because most of the issues rest with the human actors in the system. Again, not diminishing the contribution human error makes to accidents and incidents, the physical context within which errors occur is changing dramatically.

Let’s imagine the role of a sponsor of safety related research who has funds to distribute. For one, there are few such entities because most of the available funds go into making something happen in the first place. New products, aircraft, components, propulsion, or control systems always get the lion’s share of funds. Safety related research is way down the order.

The big aviation safety risks haven’t changed much in recent years, namely: controlled flight into terrain (CFIT), loss of control in-flight (LOC-I), mid-air collision (MAC), runway excursion (RE) and runway incursion (RI)[4]. What’s worth noting is that the potential for reducing each one of them is changing as the setting within which aviation operates is changing. Rapid technological innovation is shaping flight and ground operations. The balance between reliance on human activities and automation is changing. Integrated systems are getting more integrated.

As the contribution of human activities reduces so an appeal to culture has less impact. Future errors may be more machine errors rather than human errors.

It’s best to get back to designing in hard safety from day one. Safety related research should focus more on questions like; what does hard safety look like for high levels of automation, including use of artificial intelligence? What does hard safety look like for autonomous flight? What does hard safety look like for dense airspace at low level?

Just a thought.


[1] https://nl.linkedin.com/in/patrick-hudson-7221aa6

[2] Achieving a Safety Culture in Aviation (1999).

[3] https://www.flightsafetyaustralia.com/2017/08/safety-in-mind-hudsons-culture-ladder/

[4] https://www.icao.int/Meetings/a41/Documents/10004_en.pdf

Objects falling from the sky

In so far as I know, no person on the ground has been killed by an object falling from a commercial aircraft in flight. I’m happy to be corrected if that situation has changed. Strangely, in contrast there are plenty of reports of people falling from aircraft and being killed as a result[1]. Additionally, there are cases of parts shed by aircraft that subsequently contribute to an aircraft accident[2].

The most frequent reports of falling objects, in and around airports are not parts of an aircraft but that which is in the atmosphere all the time. Namely, ice. When it hits the ground in the form of a hailstorm it can be damaging. In flight, it can be seriously damaging to an aircraft.

What I’m writing about here are the third-party risks. That’s when an innocent individual finds themselves the target of an improbable event, some might call an act of God. Ice falls are rare. However, given the volume of worldwide air traffic there’s enough of them to be alert to the problem. As soon as ice accretes to create lumps bigger than a kilo there’s a real danger.

Can ice falls be prevented? Here again there’s no doubt some are because of poor maintenance or other preventable factors, but others are just nature doing its thing. Regulators are always keen to collect data on the phenomena[3]. It’s something that goes on in the background and where the resources allow there can even be follow-up investigations.

Near misses do make the newspaper headlines. The dramatic nature of the events, however rare, can be like a line from a horror movie[4]. Other cases are more a human-interest story than representing a great risk to those on the ground[5].

It’s worth noting that falling objects can be quite different from what they are first reported to be. That can be said about rare events in general.

I remember being told of one case where a sharp metal object fell into a homeowner’s garden. Not nice at all. The immediate reaction was to conclude it came from an aircraft flying overhead. Speculation then started a new story, and the fear of objects falling from aircraft was intensified.

Subsequently, an investigation found that this metal object had more humble terrestrial origins. In a nearby industrial estate a grinding wheel had shattered at highspeed sending debris flying into the air. Parts of which landed in the garden of the unfortunate near-by resident.

One lesson from this tale is that things may not always be as they first seem. Certainly, with falling objects, it’s as well to do an investigation before blaming an aircraft.  

POST 1: There’s a threat outside the atmosphere too. The space industries are ever busier. That old saying about “what goes up, must come down” is true of rockets and space junk. More a hazard to those on the ground, there is still the extreamly unlikly chance of an in-flight aircraft getting hit Unnecessary risks created by uncontrolled rocket reentries | Nature Astronomy

POST 2: EASA Safety Information Bulletin Operations SIB No.: 2022-07 Issued: 28 July 2022, Subject: Re-Entry into Earth’s Atmosphere of Space Debris of Rocket Long March 5B (CZ-5B). This SIB is issued to raise awareness on the expected re-entry into Earth’s atmosphere of the large space object.


[1] https://nypost.com/2019/07/03/man-nearly-killed-by-frozen-body-that-fell-from-plane-is-too-traumatized-to-go-home/

[2] http://concordesst.com/accident/englishreport/12.html

[3] https://www.caa.co.uk/Our-work/Make-a-report-or-complaint/Ice-falls/

[4] https://metro.co.uk/2017/02/16/10kg-block-of-ice-falls-from-plane-and-smashes-through-mans-garage-roof-6453658/

[5] https://www.portsmouth.co.uk/news/national-ice-block-falls-aircraft-and-smashes-familys-garden-1078494

Social media and aviation safety. Part 2.

Reports of aviation accidents and incidents and occurrence reports vary greatly in quantity and quality. Improvements have been made, as legislation has demanded basic data be recorded and retained.

Nevertheless, the one-line narrative is still with us. These reports are frustrating for safety analysts. If a bland statement about an aviation occurrence is received a couple of weeks after an event it can be almost impossible to classify. The good that social media can do is to supplement official information.

In most cases, mobile phone video taken by a passenger or onlooker can be checked for veracity. It needs to have the characteristics that confirm that it was taken at the time and place of the event it depicts. Photographs often have location, picture size, resolution, and device information.

It’s as well to recognise that this work can’t be taken for granted. There is work for aviation safety analysts to do verifying information. Images can be edited by effects that create an exagerated sense of drama.

Image copywrite does have to be considered. Professional photographers make it clear that their work is protected. This is often stamped on the material in some manner.

Impromptu videoing of an aviation incident, that may involve the person taking the video changes its status once its launched on social media. At least that is my understanding of the legal paperwork that few people ever read, namely the common clauses of End-User License Agreements. 

So, advice might be, to try to avoid copyright infringement it’s always a good idea to credit the source of the material used. Using copyed material in good faith is no defence for ignoring ownership.

The pursuit of aviation safety can be argued to be the pursuit of the greater public good. Unfortunately, the lawyers of some newsgathering organisations will not give the time of day to anyone who argues that they are in pursuit of the greater good.

Suprisingly, the subject of who is a press reporter or newsgathering organisation is vague in a lot of national legal frameworks. Protecting free speech is a strong case for not drawing too many boundaries but a complete free for all has a downside as “truth” goes out the window.

On another subject, privacy is a sticky one. Where people are identifiable in randomly taken pcitures or video of accidents and incidents there is currently no protection.

Again, there are questions to be answered in relation to use of social media derived safety information.

NOTE:

Example: Dramatic footage shows firefighters tackling fire on British Airways passenger plane at Copenhagen airport. [Dailymotion embeded video].

An Online Safety Bill in the UK will shake up the regulation of material on-line even if its not designed to address the issue raised in my blog. Online Safety Bill: factsheet – GOV.UK (www.gov.uk)

Social media is changing aviation safety

You may ask, how do I sustain that statement? Well, it’s not so difficult. My perspective that of one who spent years, decades in-fact, digging through accident, incident, and occurrence reports, following them up and trying to make sense of the direction aviation safety was taking.

In the 1990s, the growth of digital technology was seen as a huge boon that would help safety professionals in every way. It was difficult to see a downside. Really comprehensive databases, search capabilities and computational tools made generating safety analysis reports much faster and simpler. Getting better information to key decision-makers surely contributed to an improvement in global aviation safety. It started the ball rolling on a move to a more performance-based form of safety regulation. That ball continues to roll slowly forward but the subject has proved to be not without difficulties.

Digging through paper-based reports, that overfilled in-trays, no longer stresses-out technical specialist quite the same as it did. Answers are more accessible and can reflect the real world of daily aircraft operations. Well, that is the theory, at least. As is often the case with an expansion of a technical capability, this can lead to more questions and higher demands for accuracy, coverage, and veracity. It’s a dynamic situation.

Where data becomes public, media attention is always drawn to passenger aircraft accidents and incidents. The first questions are always about what and where it happened. A descriptive narrative. Not long after those questions comes: how and why it happened. The speed at which questions arise often depends on the severity of the event. Unlike road traffic accidents, fatal aviation accidents always command newsprint column inches, airtime, and internet flurries.

Anyone trying to answer such urgent public questions will look for context. Even in the heat of the hottest moments, perspective matters. This is because, thankfully, fatal aviation accidents remain rare. When rare events occur, there can be a reasonable unfamiliarity with their characteristic and implications. We know that knee-jerk reactions can create havoc and often not address real causes.

In the past, access to the safety data needed to construct a context was not immediately available to all commers. Yes, the media often has its “go-to” people that can provide a quick but reliable analysis, but they were few and far between.

This puts the finger on one of the biggest changes in aviation safety in the 2020s. Now, everyone is an expert. The immediacy and speed at which information flows is entirely new. That can be photography and video content from a live event. Because of the compelling nature of pictures, this fuels speculation and theorising. A lot of this is purely ephemeral but it does catch the eye of news makers, politicians, and decision-makers.

So, has anyone studied the impact of social media on developments in aviation safety? Now, there’s a good topic for a thesis.

Safety in numbers. Part 4

In the last 3 parts, we have covered just 2 basic types about failures that can be encountered in any flight. Now, that’s those that effect single systems, and their subsystems and those that impact a whole aircraft as a common effect.

The single failure cases were considered assuming that failures were independent. That is something fails but the effects are contained within one system.

There’s a whole range of other failures where dependencies exist between different systems as they fail. We did mention the relationship between a fuel system and a propulsion system. Their coexistence is obvious. What we need to do is to go beyond the obvious and look for relationships that can be characterised and studied.

At the top of my list is a condition where a cascade of failures ripple through aviation systems. This is when a trigger event starts a set of interconnected responses. Videos of falling dominoes pepper social media and there’s something satisfying about watching them fall one by one.

Aircraft systems cascade failures can start with a relatively minor event. When one failure has the potential to precipitate another it’s important to understand the nature of the dependency that can be hardwired into systems, procedures, or training.

It’s as well to note that a cascade, or avalanche breakdown may not be straightforward as it is with a line of carefully arranged dominos. The classical linear way of representing causal chains is useful. The limitation is that dominant, or hidden interdependencies can exist with multiple potential paths and different sequences of activation.

The next category of failure is a variation on the common-mode theme. This has more to do with the physical positions of systems and equipment on an aircraft. For example, a localised fire, flood, or explosion can defeat built-in redundancies or hardened components.

Earlier we mentioned particular risks. Now, we need to add to the list; bird strike, rotor burst, tyre burst and battery fires. The physical segregation of sub-systems can help address this problem.

Yes, probabilistic methods can be used to calculate likelihood of these failure conditions occurring.

The next category of failure is more a feature of failure rather than a type of failure. Everything we have talked about, so far, may be evident at the moment of occurrence. There can then be opportunities to take mitigating actions to overcome the impact of failure.

What about those aircraft systems failures that are dormant? That is that they remain passive and undetected until a moment when systems activation is needed or there’s demand for a back-up. One example could be just that, an emergency back-up battery that has discharged. It’s then unavailable when it’s needed the most. Design strategies like, pre-flight checks, built-in-test and continuous monitoring can overcome some of these conditions.

Safety in numbers, Part 3

The wind blows, the sun shines, a storm brews, and rain falls. Weather is the ultimate everyday talking point. Stand at a bus stop, start a conversation and it’ll likely be about the weather. Snow, sleet, ice or hail the atmosphere can be hostile to our best laid plans. It’s important to us because it affects us all. It has a common effect.

We started a discussion of common-mode failures in earlier paragraphs. We’ll follow it up here. Aircraft systems employ an array of strategies to address combinations and permutations of failure conditions. That said, we should not forget that these can be swamped by common-mode effects.

Environmental effects are at the top of the list of effects to consider. It’s a basic part of flying that the atmosphere changes with altitude. So, aircraft systems and equipment that work well on the ground may have vulnerabilities when exposed to large variations in temperatures, atmospheric pressure, and humidity.

Then there’s a series of effects that are inherent with rotating machinery and moving components. Vibration, shock impacts and heat all need to be addressed in design and testing.

It is possible to apply statistical methods to calculate levels of typical exposure to environmental effects, but it is more often the case that conservative limits are set as design targets.

Then there are particular risks. These are threats that, maybe don’t happen everyday but have the potential to be destructive and overcome design safety strategies. Electromagnetic interference and atmospheric disturbances, like lightning and electrostatic discharge can be dramatic. The defences against these phenomena can be to protect systems and limit impacts. Additionally, the separation or segregation of parts of systems can take advantage of any built-in redundancies.

Some common-mode effects can occur due to operational failures. The classic case is that of running out of fuel or electrical power. This is where there’s a role for dedicated back-up systems. It could be a hydraulic accumulator, a back-up battery, or a drop-out ram air turbine, for example.

Some common-mode effects are reversable and tolerable in that they don’t destroy systems and equipment but do produce forms of performance degradation. We get into the habit of talking about failure as if they are absolute, almost digital, but it’s an analogue world. There’s a range of cases where adjustments to operations can mitigate effects on aircraft performance. In fact, an aircraft’s operational envelope can be adjusted to ensure that it remains in a zone where safe flight and landing are possible, however much systems are degraded.

Probabilities can play a role in such considerations. Getting reliable data on which to base sound conclusions is often the biggest challenge. Focusing on maintaining a controllable aircraft with a minimum of propulsion, in the face of multiple hazards takes a lot of clear thought.

Safety in numbers. Part 2

Previously, we walked on a path through some simple statistics as they relate to aircraft systems. Not wishing to sound like the next episode of a popular drama, the only recap needed is, that by making a few assumptions we showed that: where P is the probability of failure and n is the number of similar concurrently operating systems:

A total failure occurs at probability Pn

A single failure occurs at probability n x P

It’s as well to distinguish between the total system and the sub-systems of which it comprises. For example, we can have one aircraft normally operating with four engines. Here we can call each individual engine a sub-system. The word “simple” can best be applied for highly reliable sub-systems where there’s only a few and n is a low number.

Aviation is going through a period of great change. A big part of that change is electrification. Today, there are numerous Quadcopter designs. The name gives it away. Here we are dealing with 4 electric motors connected to rotors. Some new aircraft designs go much further with as many as 18 electric motors. That’s 18 similar sub-systems all contributing to the safe flight and landing of an aircraft.

Superficially, it would be easy to say that if n equals 18 then the chances of the failure of all propulsion simultaneously is astronomically low. That’s true but only if considering the reliability of the electric motors providing propulsion in isolation. Each electric motor makes a partial contribution to the safe performance of the aircraft.

Just as we have with fuel systems in conventional aircraft, in an electric aircraft, each of these sub-systems are dependent upon a source of power being provided. If the source of that power disappears the aircraft’s motor count becomes irrelevant. This is referred to as the consideration of common-mode failures. The electric motors maybe independent in operation but they are all dependent upon the reliable supply of electrical power.

Before a discussion of common-mode failures, let’s go back to the earlier maths. We can see that the loss of one electric motor, amongst 18 occurs with a probability of 18 x P. Unfortunately, in these cases the possible combinations of multiple failures increases.

Given that this subject is so much easier to discuss when dealing with small numbers, let’s consider the Quadcopter. Here there are 4 electric motors and 4 groups of distinct failure condition: 1 motor failed, 2 motors failed, 3 motors failed, and 4 motors failed. For the sake of argument let’s say they perform the same function and call them motors A, B, C and D.

Except for the case where all 4 motors fail, 3 cases produce an outcome with a reduced aircraft capability. We have the way of calculating the probability of total failure and a single failure so it’s the double failure and triple failure cases that are of interest.

Let’s step through the combination of double failures that can occur. Here they are A and B, B and C, C and D, D and A, A and C, B and D. There are 6 unique combinations that make up double failures.

Let’s step through the combination of triple failures that can occur. Here they are A and B and C, B and C and D, C and D and A. D and A and B. There are 4 unique combinations that make up triple failures. We can tabulate these findings for our Quadcopter motor failures thus:

SingleDoubleTripleTotal
4P6P24P3P4

There’s a nice pattern in this table of probabilities. The number of possible combinations of multiple failures grows as n grows.  

Now, we get more into the subject of combinations and permutations. The word “combination” is more often in common usage. When we use that word, it really doesn’t matter what order that any failures occur. Often combinations are like other combinations and so each may not be entirely unique in its impact on the flight of an aircraft. Hence the doubles and triples above.

With 4 electric motors there are 24 possible combinations. This is calculated thus:

n! = n × (n – 1) × (n – 2) × (n – 3)

This is pronounced “n factorial”. So, for n = 18 this gets big. In fact, it’s 6,402,373,705,728,000. 

However, as we have seen from the Quadcopter discussion it’s the grouping of failure conditions that we are often most interested in. Afterall, for safe flight and landing of an aircraft we need to manage those failure conditions that can be managed. At the same time reducing the probability of occurrence of the failure conditions that can’t be managed.

That’s a lot of work. It may explain the drive to develop autonomous aircraft systems. The case could be made that managing flight is impossible when subject to the vast array of potential combinations and permutation of failure conditions that can exist within a multi rotor systems, where n is large.

[Do you agree?]

Safety in numbers. Part 1

It’s a common misconception that the more you have of something the better it is. Well, I say, misconception but in simple cases it’s not a misconception. For safety’s sake, it’s common to have more than one of something. In a classic everyday aircraft that might be two engines, two flight controls, two electrical generators and two pilots, so on.

It seems the most common-sense of common-sense conclusions. That if one thing fails or doesn’t do what it should we have another one to replace it. It’s not always the case that both things work together, all the time, and when one goes the other does the whole job. That’s because, like two aircraft engines, the normal situation is both working together in parallel. There are other situations where a system can be carrying the full load and another one is sitting there keeping an eye on what’s happening ready to take over, if needed.

This week, as with many weeks, thinkers and politicians have been saying we need more people with a STEM education (Science, Technology, Engineering, and Math). Often this seems common-sense and little questioned. However, it’s not always clear that people mean the same things when talking about STEM. Most particularly it’s not always clear what they consider to be Math.

To misquote the famous author H. G. Wells: Statistical thinking may, one day be as necessary as the ability to read and write. His full quote was a bit more impenetrable, but the overall meaning is captured in my shorten version.

To understand how a combination of things work together, or not, some statistical thinking is certainly needed. Fighting against the reaction that maths associated with probabilities can scare people off. Ways to keep our reasoning simple do help.

The sums for dual aircraft systems are not so difficult. That is provided we know that the something we are talking about is reliable in the first place. If it’s not reliable then the story is a different one. For the sake of argument, and considering practical reality let say that the thing we are talking about only fails once every 1000 hours.

What’s that in human terms? It’s a lot less than a year’s worth of daylight hours. That being roughly half of 24 hours x 7 days x 52 weeks = 4368 hours (putting aside location and leap years). In a year, in good health, our bodies operate continuously for that time. For the engineered systems under discussion that may not be the case. We switch the on, and we switch them off, possibly many times in a year.

That’s why we need to consider the amount of time something is exposed to the possibility of failure. We can now use the word “probability” instead of possibility. Chance and likelihood work too. When numerically expressed, probabilities range from 0 to 1. That is zero being when something will never happen and one being when something will always happen.

So, let’s think about any one hour of operation of an engineered system, and use the reliability number from our simple argument. We can liken that, making an assumption, to a probability number of P = 1/1000 or 1 x 10-3 per hour. That gives us a round number that represents the likelihood of failure in any one hour of operation of one system.

Now, back to the start. We have two systems. Maybe two engines. That is two systems that can work independently of each other. It’s true that there are some cases where they may not work independently of each other but let’s park those cases for the moment.

As soon as we have more than one thing we need to talk of combinations. Here the simple question is how many combinations exist for two working systems?

Let’s give them the names A and B. In our simplified world either A or B can work, or not work when needed to work. That’s failed or not failed, said another way. There are normally four combinations that can exist. Displayed in a table this looks like:

A okB ok
A failsB ok
A okB fails
A failsB fails
Table 1

This is all binary. We are not considering any near failure, or other anomalous behaviour that can happen in the real world. We are not considering any operator intervention that switches on or switches off our system. We are looking at the probability of a failure happening in a period of operation of both systems together.

Now, let’s say that the systems A and B each have a known probability of failure.

Thus, the last line of the table becomes: P4 = PA and PB

That is in any given hour of operation the chances of both A and B failing together are the product of their probabilities. Assuming the failures to be random.

Calculating the last line of the table becomes: P4 = PA x PB

In the first line of the table, we have the case of perfection. Simultaneous operation is not interrupted, even though we know both A and B have a likelihood of failure in any one hour of operation.

Thus, the first line becomes: P1 = (1 – PA) x (1 – PB)

Which nicely approximates to P1 = 1, given that 1/1000 is tiny by comparison.

The cases where either A or B fails are in the middle of the table.

P2 = PA x (1 – PB) together with P3 = (1 – PA) x PB

Thus, using the same logic as above the probability of A or B failing is PA + PB

It gets even better if we consider the two systems to be identical. Namely, that probabilities PA and PB  are equal.

A double failure occurs at probability P2

A single failure occurs at probability 2P

So, two systems operating in parallel there’s a decreased the likelihood of a double failure but an increase in the likelihood of a single failure. This can be taken beyond an arrangement with two systems. For an arrangement with four systems, there’s a massively decreased likelihood of a total failure but four times the increase in the likelihood of a single failure. Hence my remark at the beginning. 

[Please let me know if this is in error or there’s a better way of saying it]