Incident: Title: Boeing 737 Max MCAS Sensor Failure Incident Analysis

Published Date: 2019-03-29

Postmortem Analysis
Timeline 1. The software failure incident involving the Boeing 737 Max occurred in October [81854]. 2. The article was published on 2019-03-29. 3. Estimation: The incident occurred in October 2018.
System 1. Angle-of-attack sensor on the Boeing 737 Max [81854] 2. Maneuvering Characteristics Augmentation System (MCAS) on the Boeing 737 Max [81854]
Responsible Organization 1. Boeing - The faulty sensor and automated system on the Boeing 737 Max, known as MCAS, were identified as the primary causes of the software failure incident [81854].
Impacted Organization 1. Passengers aboard the Ethiopian Airlines flight that crashed [81854] 2. Passengers aboard the Indonesian flight involving the same jet [81854]
Software Causes 1. The failure incident was caused by a faulty sensor, specifically the angle-of-attack sensor, that erroneously activated the MCAS automated system on the Boeing 737 Max [81854]. 2. The MCAS system was originally designed to activate based on data from a single angle-of-attack sensor, leading to a single point of failure in the system [81854]. 3. The software design flaw allowed the MCAS system to push the front of the plane down, resulting in an irrecoverable nose-dive that led to the crash [81854]. 4. Boeing's software update, unveiled after the incident, specifically addresses concerns about the MCAS system and the sensors, aiming to make the system rely on two sensors instead of one to prevent similar incidents in the future [81854].
Non-software Causes 1. The failure incident was caused by a faulty sensor, specifically the angle-of-attack sensor, that erroneously activated the MCAS system on the Boeing 737 Max [81854]. 2. The reliance on data from a single angle-of-attack sensor, which measured the level of the jet's nose relative to oncoming air, was identified as a design flaw that contributed to the incident [81854]. 3. Issues with the angle-of-attack sensors, such as malfunctions due to bird strikes, jetway interactions, water pooling and freezing, were highlighted as potential causes of the failure incident [81854].
Impacts 1. The software failure incident led to the crash of an Ethiopian Airlines flight, resulting in the tragic loss of all 157 people on board [81854]. 2. The incident raised concerns about the safety and design of the Boeing 737 Max aircraft, leading to global grounding of the planes and significant financial implications for Boeing and airlines operating the aircraft [81854]. 3. The software failure incident prompted investigations by various authorities, including the Justice Department and the Transportation Department's inspector general, into the development and certification processes of the Boeing 737 Max [81854]. 4. The incident highlighted the potential systemic problem with the aircraft, putting pressure on Boeing to address the flaws in the automated system and sensors [81854]. 5. As a response to the incident, Boeing unveiled a software update to address the concerns about the Maneuvering Characteristics Augmentation System (MCAS) and the sensors, aiming to make the system more robust and reliable [81854].
Preventions 1. Implementing a design that does not rely on a single point of failure, such as a single angle-of-attack sensor, for critical systems like the MCAS [81854]. 2. Conducting more rigorous testing and analysis of the software system, including potential failure scenarios and their implications on flight safety [81854]. 3. Enhancing the software system to rely on data from multiple sensors to increase redundancy and reliability [81854].
Fixes 1. Implementing a software update that specifically addresses concerns about the Maneuvering Characteristics Augmentation System (MCAS) and the sensors [81854]. 2. Making the system rely on data from two sensors instead of just one to prevent a single point of failure [81854]. 3. Limiting MCAS, in most cases, from engaging more than once to prevent potential issues [81854]. 4. Preventing the system from pushing the plane's nose down more than a pilot could counteract by pulling up on the controls [81854].
References 1. Black box data from the doomed Ethiopian Airlines flight [81854] 2. Briefings with several people who have been briefed on the contents of the black box in Ethiopia [81854] 3. Former employees at Boeing and the supplier that made the sensor [81854] 4. Air-safety experts [81854] 5. Former Boeing and Rosemount engineers [81854] 6. Investigators of the Indonesia crash [81854] 7. Mel McIntyre, a retired Boeing engineer who worked with such sensors for years [81854]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident related to the Boeing 737 Max MCAS system activating erroneously due to a faulty sensor causing a fatal crash has happened again within the same organization. The incident occurred in both the Ethiopian Airlines flight and an Indonesian disaster involving the same jet [81854]. (b) The software failure incident related to the Boeing 737 Max MCAS system activating erroneously due to a faulty sensor causing a fatal crash has also happened at multiple organizations. The incident in Ethiopia was similar to the one in Indonesia, indicating a potential systemic problem with the aircraft [81854].
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase is evident in the case of the Boeing 737 Max crashes. The incidents were caused by a faulty sensor that erroneously activated the MCAS (Maneuvering Characteristics Augmentation System) due to data from a single angle-of-attack sensor, leading to an irrecoverable nose-dive [81854]. This design flaw allowed a single sensor to activate a crucial system that pushed the aircraft toward the ground, highlighting a systemic problem with the aircraft's design [81854]. (b) The software failure incident related to the operation phase is seen in the pilots' attempts to override the MCAS system during the Indonesian flight. The pilots tried repeatedly to counteract the system, but after about 12 minutes, they lost their battle, ultimately leading to the crash [81854]. Additionally, in the Ethiopian Airlines crash, the pilots experienced trouble controlling the aircraft as it exhibited a bouncing, bobbing trajectory before crashing, indicating operational challenges in managing the system [81854].
Boundary (Internal/External) within_system, outside_system (a) within_system: The software failure incident related to the Boeing 737 Max crashes was primarily within the system. The incident was caused by a faulty sensor, specifically the angle-of-attack sensor, that erroneously activated the MCAS (Maneuvering Characteristics Augmentation System) on the aircraft, leading to the fatal crashes [81854]. (b) outside_system: While the primary cause of the software failure incident was within the system, there were contributing factors from outside the system as well. For example, the article mentions that the angle-of-attack sensors can fail due to various external factors such as bird strikes, jetway impacts, or freezing at high altitudes [81854]. Additionally, there were concerns raised about the design of the system and the reliance on data from a single sensor, indicating potential oversight in the certification process and regulatory oversight by external entities like the Federal Aviation Administration [81854].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in the Boeing 737 Max crashes was primarily due to non-human actions. The incidents were caused by a faulty sensor, specifically the angle-of-attack sensor, which erroneously activated the MCAS automated system on the aircraft, leading to a series of events that resulted in the crashes [81854]. (b) However, human actions also played a role in the software failure incident. Boeing faced scrutiny for its role in the design and certification of the plane, with concerns raised about the initial design flaws in the MCAS system and the reliance on data from a single sensor. Boeing later unveiled a software update to address these concerns, indicating a recognition of the need for human intervention to rectify the issues [81854].
Dimension (Hardware/Software) hardware, software (a) The software failure incident occurring due to hardware: - The article mentions that the crash of the Ethiopian Airlines flight was caused by a faulty sensor, specifically the angle-of-attack sensor, which erroneously activated the MCAS automated system on the Boeing 737 Max [81854]. - The angle-of-attack sensor, a hardware component, incorrectly activated the computer-controlled system, leading to an irrecoverable nose-dive that resulted in the crash [81854]. - The article also discusses how the angle-of-attack sensors, which are hardware components, can fail due to various reasons such as bird strikes, jetway impacts, freezing at high altitudes, or malfunctions [81854]. (b) The software failure incident occurring due to software: - The article mentions that Boeing unveiled a software update to address concerns about the MCAS system and the sensors, indicating that there were software issues contributing to the failure incident [81854]. - The software update specifically aims to address the suspected problems that may have led to the two deadly crashes involving the Boeing 737 Max jets [81854]. - The update will make the system rely on two sensors instead of one and limit the engagement of MCAS in most cases, showing that software modifications are being implemented to prevent similar incidents in the future [81854].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident related to the Boeing 737 Max crashes was non-malicious. The incidents were caused by a faulty sensor (angle-of-attack sensor) that erroneously activated the MCAS (Maneuvering Characteristics Augmentation System) on the aircraft, leading to a series of events that resulted in the crashes. The system was designed to activate based on data from a single angle-of-attack sensor, which was identified as a flaw in the system design. Boeing has since acknowledged the design flaw and unveiled a software update to address the concerns about MCAS and the sensors [81854]. (b) The software failure incident was non-malicious as it was attributed to a design flaw in the system rather than any intentional actions to harm the system. The reliance on data from a single sensor, the angle-of-attack sensor, was identified as a critical flaw in the system design, leading to the activation of the MCAS system and subsequent crashes. Boeing has taken steps to address the design flaw through a software update that aims to make the system more robust and prevent similar incidents in the future [81854].
Intent (Poor/Accidental Decisions) poor_decisions, accidental_decisions The software failure incident related to the Boeing 737 Max crashes can be attributed to both poor decisions and accidental decisions: (a) poor_decisions: The incident involved poor decisions such as the initial design flaw of the MCAS system that relied on data from a single angle-of-attack sensor, which was considered a systemic problem with the aircraft [81854]. Boeing faced scrutiny for its role in the design and certification of the plane, with concerns raised by air-safety experts and former employees about the single point of failure in the system [81854]. (b) accidental_decisions: The incident also involved accidental decisions or unintended consequences, such as the erroneous activation of the MCAS system by a faulty sensor, leading to an irrecoverable nose-dive that caused the crashes [81854]. The activation of MCAS based on bad data from a sensor in the Indonesia crash was an unintended consequence that led to the tragic outcome [81854].
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident related to development incompetence is evident in the case of the Boeing 737 Max crashes. The incidents were caused by a faulty sensor that erroneously activated an automated system known as MCAS, which pushed the front of the plane down, leading to irrecoverable nose-dives and resulting in the loss of all passengers on board [81854]. The system was originally designed to activate based on data from a single angle-of-attack sensor, which experts criticized as a flawed engineering system with a single point of failure [81854]. Boeing has faced scrutiny for its role in the design and certification of the plane, with investigations ongoing to determine what went wrong, indicating potential systemic problems with the aircraft [81854]. (b) The software failure incident related to accidental factors includes the sensor malfunctions that occurred due to various reasons such as bird strikes, jetway interactions, water pooling and freezing, and sensor failures in the past [81854]. These accidental factors contributed to the erroneous activation of the MCAS system, leading to the tragic crashes of the Boeing 737 Max planes [81854].
Duration temporary The software failure incident related to the Boeing 737 Max crashes can be categorized as a temporary failure. The incident was caused by a faulty sensor (angle-of-attack sensor) that erroneously activated the MCAS system, leading to the crashes in Ethiopia and Indonesia. Boeing has acknowledged the initial design flaw and has unveiled a software update to address the concerns about MCAS and the sensors. The update will make the system rely on two sensors instead of one and limit MCAS from engaging more than once in most cases. This indicates that the failure was due to contributing factors introduced by certain circumstances (faulty sensor design and activation) but not all circumstances, as Boeing is taking specific steps to rectify the issue and prevent similar incidents in the future [81854].
Behaviour crash, omission, value, other (a) crash: The software failure incident in the Boeing 737 Max crashes was due to the system erroneously activating an automated system (MCAS) that pushed the front of the plane down, leading to an irrecoverable nose-dive that resulted in the crashes in both Ethiopia and Indonesia [81854]. (b) omission: The system failed to perform its intended function of stabilizing the aircraft by activating the MCAS system based on erroneous data from a single angle-of-attack sensor, which led to the fatal crashes [81854]. (c) timing: There is no specific mention of the software failure incident being related to timing issues in the articles. (d) value: The software failure incident falls under this category as the system performed its intended functions incorrectly by activating the MCAS system based on faulty sensor data, causing the crashes [81854]. (e) byzantine: The software failure incident does not exhibit characteristics of a byzantine failure in the articles. (f) other: The software failure incident could also be categorized as a flaw in the system design, as experts and former employees expressed concerns about the system having a single point of failure with the angle-of-attack sensor, which is considered a flaw in aviation engineering [81854].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence death, harm (a) death: The consequence of the software failure incident was that all 157 people aboard the Ethiopian Airlines flight lost their lives due to the faulty sensor activating an automated system that led to an irrecoverable nose-dive [81854].
Domain transportation The software failure incident discussed in the articles is related to the transportation industry. Specifically, the failed system was part of the Boeing 737 Max aircraft, which is used for transporting people. The faulty sensor, known as the angle-of-attack sensor, erroneously activated the MCAS system on the aircraft, leading to the fatal crashes of Ethiopian Airlines Flight 302 and Lion Air Flight 610 [Article 81854]. The incident has raised concerns about the design and certification of the aircraft, leading to investigations by regulatory authorities and scrutiny on Boeing's role in the development of the plane.

Sources

Back to List