Incident: Boeing 737 MAX MCAS Software Failure Incident.

Published Date: 2018-10-30

Postmortem Analysis
Timeline 1. The software failure incident happened in October 2018 [Article 107214]. 2. The software failure incident happened in March 2019 [Article 104849]. 3. The software failure incident happened in November 2018 [Article 84583]. 4. The software failure incident happened in 2017 [Article 85174]. 5. The software failure incident happened in 2016 [Article 118499]. 6. The software failure incident happened in 2015 [Article 120520]. 7. The software failure incident happened in 2012 [Article 85618]. 8. The software failure incident happened in 2009 [Article 94250].
System 1. Manoeuvring Characteristics Augmentation System (MCAS) [83293, 84583, 89311, 90396, 90485, 90523, 90525, 90958, 94871, 115439] 2. Flaps and other flight-control hardware software [83843] 3. A330’s computer system [78176]
Responsible Organization 1. Boeing [81957, 83129, 85618, 90523, 90525, 90958, 91415, 94176, 94250, 94871, 104849, 107214] 2. Federal Aviation Administration (FAA) [81957, 83129, 83383, 85174, 94871] 3. Design flaws in the MCAS system [89311, 90396, 90525, 90958, 109768]
Impacted Organization 1. Boeing [81957, 82345, 82641, 83383, 83843, 84470, 85618, 85679, 89311, 90525, 90958, 94176, 94206, 94250, 94871, 104849, 108578] 2. Federal Aviation Administration (FAA) [81957, 83383, 83843, 94206, 115439] 3. Pilots and passengers [83374, 90525, 90958] 4. Airlines [82345, 85618, 85679] 5. Maintenance crews [90396] 6. Regulators [90396] 7. US lawmakers [94871]
Software Causes 1. Malfunctioning sensor and automated response from the aircraft's software [Article 78184] 2. Software limitation of the A330's computer system [Article 78176] 3. Software affecting flaps and other flight-control hardware [Article 83843] 4. Erroneous activation of the aircraft's MCAS function due to a software error [Article 83939] 5. Software defects blamed for Boeing 737-MAX 8 crashes [Article 85618] 6. Software system called MCAS contributing to pilots not being able to control the aircraft [Article 89311] 7. Software error resulting in a warning light not working and lack of guidance on MCAS in flight manuals [Article 90485] 8. Software system MCAS designed to make the aircraft easier to fly contributing to the crashes [Article 90525] 9. Software problems with the flight simulator [Article 90657] 10. Software system MCAS relying on a single sensor and receiving erroneous data [Article 94871] 11. Software forcing down the noses of the planes in a way pilots could not overcome [Article 104849] 12. Flight control software issues contributing to the crashes [Article 108578] 13. MCAS function not being a fail-safe design and lacking redundancy [Article 115439]
Non-software Causes 1. Inadequate flying skills and poor communication among the flight crew [Article 90640] 2. Deficiencies in the flight crew's communication and manual control of the aircraft [Article 90640] 3. Mechanical and design problems with the flight control system [Article 90958] 4. Improper maintenance procedures and lack of a cockpit warning light [Article 115439]
Impacts 1. The software limitation of the A330's computer system caused an accident, leading to the implementation of procedures to prevent similar outcomes [Article 78176]. 2. Boeing faced multiple software problems, including a second software problem affecting flaps and flight-control hardware, which was classified as critical to flight safety [Article 83843]. 3. Boeing's software issue, related to the lack of an alert function, was known internally for a significant period before action was taken, raising concerns about the impact on airplane safety [Article 84470]. 4. The software system MCAS contributed to pilots' inability to control the aircraft in the crashes, leading to revisions of the plane's software by Boeing [Article 89311, Article 90525]. 5. The software system developed for the Boeing 737-MAX 8 was found to have played a role in both crashes, resulting in the grounding of the aircraft worldwide for months [Article 85618, Article 94206]. 6. The software system's faulty technical assumptions, lack of transparency, and insufficient oversight by Boeing and the FAA were identified as contributing factors to the crashes, leading to stinging charges against Boeing [Article 104826]. 7. The software forced down the noses of the planes in a way that pilots could not overcome, causing crashes that killed 346 people [Article 104849]. 8. Boeing executives and engineers were criticized for not taking warning signs seriously, opting against additional precautions, and making decisions for cost-cutting, leading to the crashes [Article 107214]. 9. Boeing implemented modifications post-crashes, including updating flight control software, revising crew procedures, and rerouting internal wiring [Article 108578]. 10. Production defects, intermittent flight control system problems, electrical anomalies, and sensor failures were identified as impacts of the software failure incident, contributing to the accidents [Article 109768].
Preventions 1. Proper testing and identification of faults in the A330's computer system before putting the plane into service [78176]. 2. Timely communication and disclosure of software issues by Boeing to regulators and operators [83843, 84583, 85174, 85249]. 3. Comprehensive training for pilots on new software systems like MCAS to ensure they can respond effectively to emergencies [83374, 90377]. 4. Implementation of fail-safe designs and redundancy in critical software systems like MCAS [109768, 115439]. 5. Thorough certification processes by regulatory bodies like the FAA to detect and address software errors [91415, 94871]. 6. Proper maintenance procedures and cockpit warning lights to alert pilots of system malfunctions [115439].
Fixes 1. Software fixes developed by Boeing for the 737 Max planes [82434, 82641] 2. Revision of the plane's software to improve safeguards by Boeing [89311, 90525] 3. Modifications implemented by Boeing including updating flight control software, revising crew procedures, and rerouting internal wiring [108578]
References 1. US Federal Aviation Administration (FAA) [81957, 83170, 83843, 85174, 120520] 2. Boeing [81957, 83129, 83843, 83939, 84470, 90525, 90540, 94176, 109768, 115439] 3. Daily Telegraph [81957] 4. CNN [83170] 5. The Washington Post [83843] 6. CEO Dennis Muilenburg [83939] 7. The Wall Street Journal [85174] 8. Lion Air [90485, 90540, 109768] 9. Ethiopian Airlines [83383, 83939, 90525, 115439] 10. Ars Technica [94176] 11. The New York Times [94250] 12. US lawmakers [94871] 13. Dutch Safety Board [94250] 14. Seattle Times [115439, 120520] 15. Aircraft Accident Investigation Bureau [115439] 16. Federal Aviation Administration (FAA) [120911]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident having happened again at one_organization: - The software failure incident involving the Boeing 737-MAX 8 crashes in October and March, which killed 346 people, was attributed to a software system called MCAS (Manoeuvring Characteristics Augmentation System) [Article 85618]. - Boeing executives acknowledged the role of erroneous angle of attack sensor information triggering the MCAS system in the two crashes [Article 83170]. - Boeing confirmed a second software problem separate from the anti-stall system under investigation, affecting flaps and other flight-control hardware, classified as critical to flight safety [Article 83843]. - Boeing found a second software problem that the Federal Aviation Administration ordered fixed, in addition to the anti-stall system issue [Article 83843]. - Boeing was aware of a software issue that did not adversely impact airplane safety or operation, but it's not known if this played a role in the crashes of Lion Air and Ethiopian Airlines planes [Article 84470]. - Boeing engineers identified that alerts weren't operating as intended due to a software error, but senior Boeing leaders only learned about the issue after the Lion Air crash [Article 85174]. - Boeing is working on a software fix for the Boeing 737-MAX aircraft, with no timetable for when regulators will allow the aircraft to return to service [Article 90376]. (b) The software failure incident having happened again at multiple_organization: - The software failure incident involving the Boeing 737-MAX 8 crashes in October and March, which killed 346 people, highlighted the serious downsides of software defects in the aviation industry [Article 85618]. - The software failure incident was not limited to Boeing, as there were reports of similar incidents at other organizations or with their products and services, such as automotive recalls linked to electronic and software failures [Article 85618].
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase: - Investigators highlighted deficiencies in the design of the MCAS system, which relied on information from a single external sensor, making it vulnerable to erroneous input [Article 90396]. - Boeing made faulty design assumptions regarding pilots' response if a crucial system malfunctioned, leading to the MCAS system kicking in on fatal flights due to a faulty sensor [Article 104794]. (b) The software failure incident related to the operation phase: - The accident was caused by a complex chain of events, including deficiencies in the flight crew's communication and manual control of the aircraft, alerts, and distractions in the cockpit [Article 90640]. - The crew's poor communication, bad design, and inadequate flying skills contributed to the crash, along with alerts and distractions in the cockpit [Article 90640]. - The crew's actions were also considered a factor in the crashes, along with the vulnerability of the MCAS system due to relying on data from a single sensor [Article 90958].
Boundary (Internal/External) within_system, outside_system (a) within_system: - The software failure incident was related to a system called MCAS (Manoeuvring Characteristics Augmentation System) designed to make the aircraft easier to fly, which contributed to pilots not being able to control the aircraft [#89311, #90525]. - Investigators found that the MCAS system relied on information from a single external sensor, making it vulnerable to erroneous input from that sensor, which was a contributing factor to the incident [#90396]. - Boeing had not provided pilots with information that could have helped them react to the malfunction of the MCAS system, which was a contributing factor to the incident [#94203]. - The MCAS system had a single point of failure due to relying on only one sensor, which caused the software to go haywire in both crashes [#107214]. (b) outside_system: - The incident was also influenced by faulty sensor data triggering the anti-stall system, revealing a single point of failure on the plane, which was a factor originating from outside the system [#83129]. - The incident was caused by a complex chain of events, including deficiencies in the flight crew's communication and manual control of the aircraft, alerts, and distractions in the cockpit, which were factors originating from outside the system [#90640]. - Investigators believed that both accidents were triggered by the failure of a single sensor sending inaccurate data to the MCAS system, which was a factor originating from outside the system [#109768].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident occurring due to non-human actions: - The incidents were linked to a software system called MCAS, which contributed to pilots not being able to control the aircraft [Article 89311, Article 90525]. - Boeing blamed the crashes on erroneous data fed into the system and mentioned revising the plane's software to improve safeguards [Article 90525]. - The failures were attributed to a sensor failure with no redundancy and a problem with MCAS, the new software controlling the handling of the aircraft that the air crews had not been trained to overcome [Article 94176]. (b) The software failure incident occurring due to human actions: - Deficiencies in the flight crew's communication and manual control of the aircraft were mentioned as contributing factors to the crash [Article 90640]. - The report highlighted poor communication, bad design, and inadequate flying skills as factors leading to the deaths of 189 people in the incident [Article 90523]. - Investigators pointed out that the actions of the pilots were also a factor in the crashes, along with the vulnerability of the MCAS system that relied on data from a single sensor [Article 90958].
Dimension (Hardware/Software) hardware, software (a) The software failure incident occurring due to hardware: - Investigators believe both accidents involving the Boeing 737 MAX were triggered by the failure of a single sensor that sent inaccurate data to the MCAS flight control software, indicating hardware-related issues [109768]. - The report also mentioned that intermittent flight control system problems and electrical anomalies occurred in the days and weeks before the accidents, suggesting hardware-related issues [109768]. (b) The software failure incident occurring due to software: - The crashes of the Boeing 737 MAX were partially caused by a sensor failure and a problem with the MCAS software, which the air crews had not been trained to overcome, indicating software-related issues [94176]. - Boeing executives acknowledged that the software controlling the handling of the aircraft, specifically the MCAS system, was a factor in the crashes [94176]. - Boeing has stated that the MCAS software system received erroneous data, leading it to override pilot commands and push the aircraft downwards, highlighting software-related issues [94871]. - The House investigators concluded that the crashes were caused in part by the software, specifically the MCAS system, which automatically pushed the nose of the planes down [104683].
Objective (Malicious/Non-malicious) non-malicious (a) The articles provide information related to non-malicious software failure incidents: 1. The incidents involving the Boeing 737 MAX crashes were attributed to a software system called MCAS (Manoeuvring Characteristics Augmentation System) [89311, 90525]. 2. Investigators found that the MCAS system relied on information from a single external sensor, making it vulnerable to erroneous input from that sensor [90396]. 3. Boeing executives acknowledged that the crashes were caused by a chain of events, with a common chain link being the erroneous activation of the aircraft's MCAS function [83939]. 4. The software failure incidents were linked to faulty sensor data triggering the anti-stall system, revealing a single point of failure on the plane [83129]. 5. The preliminary report into the Ethiopian disaster showed a key sensor was wrecked, possibly by a bird strike, leading to faulty data being fed into the MCAS system [83383]. These incidents highlight non-malicious software failures caused by design flaws, sensor failures, and inadequate training rather than intentional harm to the system.
Intent (Poor/Accidental Decisions) poor_decisions, accidental_decisions (a) The software failure incident related to poor decisions: - The incidents were attributed to a series of faulty technical assumptions by Boeing's engineers, lack of transparency by Boeing's management, and insufficient oversight by the FAA, which were described as a disturbing pattern of technical miscalculations and troubling management misjudgments [Article 104849]. - Boeing made faulty design assumptions, particularly regarding pilots' response if a crucial system malfunctioned, leading to the activation of the system causing the crashes [Article 104794]. - The crashes were described as the horrific culmination of a series of faulty technical assumptions by Boeing's engineers, a lack of transparency on the part of Boeing's management, and grossly insufficient oversight by the FAA, indicating a culture of concealment and troubling mismanagement misjudgments [Article 104826]. (b) The software failure incident related to accidental decisions: - The incidents were also linked to conditions at Boeing's factory in Renton, near Seattle, with investigators believing that both accidents were triggered by the failure of a single sensor, leading to production defects and intermittent flight control system problems, suggesting accidental decisions or mistakes in the production process [Article 109768]. - The crashes were attributed to a complex chain of events, with multiple factors contributing to the accidents, indicating that if one of the nine contributing factors hadn't occurred, the accidents might have been avoided, suggesting a combination of accidental decisions and errors [Article 90640].
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident occurring due to development incompetence: - The incidents were attributed to faulty technical assumptions by Boeing's engineers, lack of transparency by Boeing's management, and insufficient oversight by the FAA, indicating a failure in professional competence [Article 104826, Article 104849]. - The report highlighted production defects and flaws in the aircrafts' wiring systems that may have contributed to the erroneous deployment of the MCAS system, suggesting issues related to development incompetence [Article 109768]. (b) The software failure incident occurring accidentally: - The article mentions that the software failure incidents were a result of a complex chain of events involving multiple contributing factors, indicating that the failures were not intentional but rather accidental [Article 90640]. - The report also mentioned that the accidents involved a malfunctioning sensor and automated responses from the aircraft's software, suggesting that the failures were accidental in nature [Article 78184].
Duration permanent, temporary (a) The software failure incident related to the Boeing 737-MAX crashes was more of a permanent failure due to contributing factors introduced by all circumstances. The crashes were attributed to faulty technical assumptions by Boeing's engineers, lack of transparency by Boeing's management, insufficient oversight by the FAA, reliance on a single sensor leading to a single point of failure, and flaws in the aircraft's highly complex wiring systems [Article 104849, Article 107214, Article 109768]. (b) The software failure incident related to the A330's computer system was more of a temporary failure due to contributing factors introduced by certain circumstances. The incident was caused by a software limitation in the A330's computer system, which was only discovered after the plane was put into service, highlighting the challenges of finding faults in testing and the unpredictability of such issues [Article 78176].
Behaviour crash, omission, value, other (a) crash: The software failure incident in the articles resulted in crashes of Boeing 737-MAX 8 aircraft, leading to the deaths of 346 people. The crashes were attributed to the MCAS (Manoeuvring Characteristics Augmentation System) software system, which was designed to make the aircraft easier to fly but contributed to pilots not being able to control the aircraft [85618, 90525]. (b) omission: The software failure incident involved the system omitting to perform its intended functions at instances, such as the failure of a single sensor causing systems to misfire with catastrophic results, and Boeing not providing pilots with information that could have helped them react to the malfunction [94203]. (c) timing: The software failure incident did not specifically mention failures due to timing issues. (d) value: The software failure incident involved the system performing its intended functions incorrectly, such as the MCAS deploying at the wrong time due to sensor failures, leading to pilots facing a variety of alerts and warnings that had not occurred in the simulator [89311, 90958]. (e) byzantine: The software failure incident did not specifically mention failures due to byzantine behaviors. (f) other: The software failure incident also involved failures due to design flaws, faulty technical assumptions, lack of transparency, insufficient oversight, cost-cutting measures, and a culture of concealment at Boeing, which all contributed to the crashes [104794, 104826, 107214].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence death, harm (a) death: People lost their lives due to the software failure The software failure incident related to the Boeing 737-MAX crashes in October and March resulted in the deaths of 346 people [Article 85618, Article 104849]. The consequence of the software failure incident was the loss of lives due to the crashes caused by the software issues in the Boeing 737-MAX aircraft.
Domain transportation, manufacturing (a) The failed system was intended to support the manufacturing industry. The software failure incident was related to the Boeing 737 Max aircraft, which is a product of the manufacturing industry [84433, 90523, 109768]. (b) The failed system was not related to the transportation industry. (c) The failed system was not related to the extraction of natural resources. (d) The failed system was not related to the sales industry. (e) The failed system was not related to the construction industry. (f) The failed system was intended to support the manufacturing industry [84433, 90523, 109768]. (g) The failed system was not related to the utilities industry. (h) The failed system was not related to the finance industry. (i) The failed system was not related to the knowledge industry. (j) The failed system was not related to the health industry. (k) The failed system was not related to the entertainment industry. (l) The failed system was not related to the government industry. (m) The failed system was related to the manufacturing industry, specifically the production of aircraft [84433, 90523, 109768].

Sources

Back to List