Incident: Boeing 737 MAX Software Design Failure Leading to Fatal Crash

Published Date: 2018-11-13

Postmortem Analysis
Timeline 1. The software failure incident involving the Boeing 737 MAX jets happened in October 2018 as per the article [77758]. Therefore, the software failure incident happened in October 2018.
System 1. Anti-stall system on the Boeing 737 MAX jets [Article 77758]
Responsible Organization 1. Boeing Co [Article 77758]
Impacted Organization 1. Pilots flying the Boeing 737 MAX jets were impacted by the software failure incident as they were not aware of the new anti-stall system and faced challenges in controlling the aircraft during certain failure modes [Article 77758].
Software Causes 1. The software cause of the failure incident was related to a system designed to deal with the accident scenario not being described in the flight manual, leading to the crash [77758]. 2. Another software cause was the faulty sensor for the 'angle of attack,' which is a vital piece of data needed to help the aircraft fly at the right angle to the currents of air and prevent a stall [77758]. 3. Information recovered from the jet's data recorder led to an emergency directive warning pilots that a computer on the 737 MAX could force the plane to descend sharply for up to 10 seconds even in manual flight, making it difficult for a pilot to control the aircraft [77758].
Non-software Causes 1. Lack of description of a system designed to deal with the accident scenario in the flight manual [77758]. 2. Insufficient training for 737 MAX pilots regarding the new anti-stall system [77758]. 3. Potential maintenance problems, including a faulty sensor for the 'angle of attack' [77758]. 4. Concerns about the clarity of U.S.-approved procedures to help pilots prevent the 737 MAX from over-reacting to data loss [77758]. 5. Questions about pilot preparedness for automatic responses and response time in case of unexpected system behavior [77758].
Impacts 1. The software failure incident led to the deadly crash of a Lion Air 737 MAX jet in Indonesia, resulting in the loss of all 189 people on board [Article 77758]. 2. The incident raised concerns about the need for potential software or design changes to the 737 MAX jets, impacting Boeing's shares which fell 2.1 percent [Article 77758]. 3. Indonesian investigators found that a system designed to handle the accident scenario was not described in the flight manual, leading to a call for more training for 737 MAX pilots [Article 77758]. 4. The incident prompted the FAA to issue an emergency directive warning pilots about a computer on the 737 MAX that could force the plane to descend sharply for up to 10 seconds, potentially making it difficult for pilots to control the aircraft [Article 77758]. 5. Questions were raised about pilot preparedness for automatic responses from the software system and the time available for pilots to react in case of unexpected behavior [Article 77758].
Preventions 1. Improved training and awareness for pilots regarding the new anti-stall system in the 737 MAX jets could have prevented the software failure incident [77758]. 2. Ensuring that the system designed to deal with the accident scenario is properly documented in the flight manual could have prevented the incident [77758]. 3. Conducting thorough testing and evaluation of the software to identify and address any potential failure modes, such as inaccurate sensor data feeding into the airplane's systems, could have prevented the incident [77758].
Fixes 1. Evaluating the need for software or design changes to 737 MAX jets [Article 77758] 2. Providing more training for 737 MAX pilots on the system designed to deal with the accident scenario [Article 77758] 3. Updating operating procedures and training for the 737 MAX based on investigation findings [Article 77758] 4. Issuing an emergency directive to warn pilots about potential software issues and how to respond [Article 77758]
References 1. Indonesian investigators 2. U.S. pilot unions 3. FAA 4. Boeing Chief Executive Dennis Muilenburg 5. Boeing 6. FAA denial of a new probe 7. Boeing executive mentioned supplier problems [77758]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident related to the Boeing 737 MAX jets is specific to Boeing as an organization. The incident involved a system designed to deal with an accident scenario not being described in the flight manual, leading to concerns about the clarity of procedures and training for pilots [77758]. Boeing is evaluating the need for software or design changes to the 737 MAX jets following the deadly Lion Air crash in Indonesia, indicating an internal software-related issue within the organization. (b) There is no specific mention in the provided article about a similar software failure incident happening at other organizations or with their products and services.
Phase (Design/Operation) design, operation (a) The software failure incident in the Boeing 737 MAX jets was related to design factors introduced during system development. The U.S. Federal Aviation Administration and Boeing were evaluating the need for software or design changes to the jets following the deadly Lion Air crash in Indonesia. Indonesian investigators found that a system designed to deal with the accident scenario was not described in the flight manual, indicating a design flaw. They also called for more training for 737 MAX pilots, suggesting that the design might not have adequately considered pilot training needs [77758]. (b) The software failure incident also had implications for the operation of the 737 MAX jets. The focus of the investigation was expanding to the clarity of U.S.-approved procedures to help pilots prevent the 737 MAX from over-reacting to data loss. The FAA issued an emergency directive warning pilots about a computer on the 737 MAX that could force the plane to descend sharply for up to 10 seconds even in manual flight, making it difficult for a pilot to control the aircraft. This highlights an operational issue where pilots needed to be aware of how to handle unexpected automated responses [77758].
Boundary (Internal/External) within_system, outside_system (a) within_system: The software failure incident related to the Boeing 737 MAX jets involved a system designed to deal with the accident scenario that was not described in the flight manual, leading to concerns about potential maintenance problems including a faulty sensor for the 'angle of attack' [77758]. (b) outside_system: The investigation into the software failure incident is also focusing on the clarity of U.S.-approved procedures to help pilots prevent the 737 MAX from over-reacting to data loss and the methods for training them, indicating that contributing factors to the failure may originate from outside the system [77758].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in the Boeing 737 MAX crash was related to non-human actions. The incident was attributed to a system designed to deal with the accident scenario not being described in the flight manual, leading to concerns about the clarity of procedures to prevent the aircraft from over-reacting to data loss [77758]. Additionally, the faulty sensor for the 'angle of attack' was a key factor in the investigation, indicating a technical issue rather than a human error. (b) Human actions were also a contributing factor in the software failure incident. Indonesian investigators called for more training for 737 MAX pilots, indicating a potential lack of awareness or training on the new anti-stall system. U.S. pilot unions mentioned they were not aware of this system, suggesting a gap in human knowledge and training [77758].
Dimension (Hardware/Software) hardware, software (a) The software failure incident related to hardware: - The article mentions that there were concerns about a faulty sensor for the 'angle of attack,' which is a vital piece of data needed for the aircraft to fly correctly [77758]. (b) The software failure incident related to software: - The article discusses a system designed to deal with the accident scenario not being described in the flight manual, indicating a potential software issue [77758]. - It is mentioned that information recovered from the jet's data recorder led to an emergency directive warning pilots about a computer on the 737 MAX that could force the plane to descend sharply, highlighting a software-related issue [77758].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident in the article does not indicate any malicious intent by humans to harm the system. The focus is on potential design and training issues related to the 737 MAX jets following the deadly Lion Air crash in Indonesia. Investigators are looking into system design, pilot training, and procedures to prevent the aircraft from over-reacting to data loss, particularly related to the angle of attack sensor. Boeing CEO emphasized providing necessary information for safe operation of their airplanes and handling failure modes [77758]. (b) The software failure incident is more aligned with non-malicious factors such as potential design flaws, lack of clarity in procedures, and training gaps rather than intentional harm to the system. The incident highlights the importance of addressing these contributing factors to enhance the safety and reliability of the aircraft software [77758].
Intent (Poor/Accidental Decisions) accidental_decisions The software failure incident related to the Boeing 737 MAX crash in Indonesia does not directly point to a clear intent of the failure being due to poor decisions or accidental decisions. However, the incident highlights potential contributing factors introduced by mistakes or unintended decisions, such as the lack of clarity in the flight manual regarding the anti-stall system, the need for more training for pilots, and questions about pilot preparedness for automated responses [77758].
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident related to development incompetence is evident in the article as it mentions that the Indonesian investigators found that a system designed to deal with the accident scenario was not described in the flight manual for the Boeing 737 MAX jets [77758]. This lack of proper documentation and training for pilots indicates a failure in professional competence during the development process. (b) The software failure incident related to accidental factors is highlighted in the article when it mentions that the FAA issued an emergency directive warning pilots that a computer on the 737 MAX could force the plane to descend sharply for up to 10 seconds even in manual flight, making it difficult for a pilot to control the aircraft [77758]. This unexpected behavior of the software system could be considered an accidental factor leading to the failure incident.
Duration temporary The software failure incident related to the Boeing 737 MAX jets appears to be more of a temporary failure rather than a permanent one. The incident was triggered by specific circumstances, such as the inaccurate angle of attack sensor feeding incorrect information to the airplane, leading to the anti-stall system forcing the plane to descend sharply. This specific scenario caused concerns and led to evaluations for potential software or design changes to address the issue [Article 77758].
Behaviour crash, omission, value, other (a) crash: The software failure incident in the article is related to a crash where the system lost state and did not perform its intended functions. The article mentions that the Lion Air crash in Indonesia involved the Boeing 737 MAX jet diving into the sea, resulting in the death of all 189 people on board [77758]. (b) omission: The software failure incident also involved an omission where a system designed to deal with the accident scenario was not described in the flight manual, according to Indonesian investigators. This omission led to a lack of awareness among U.S. pilot unions about the new anti-stall system [77758]. (c) timing: The timing of the software failure incident is related to the system performing its intended functions incorrectly but not necessarily too late or too early. The article mentions that the system could force the plane to descend sharply for up to 10 seconds even in manual flight, making it difficult for a pilot to control the aircraft [77758]. (d) value: The software failure incident also involves a failure related to the system performing its intended functions incorrectly. Specifically, there were concerns about potential maintenance problems, including a faulty sensor for the 'angle of attack,' which is a vital piece of data needed for the aircraft to fly at the right angle to the currents of air and prevent a stall [77758]. (e) byzantine: The software failure incident does not exhibit behavior related to a byzantine failure, which involves the system behaving erroneously with inconsistent responses and interactions. (f) other: The software failure incident in the article also involves a behavior not described in the options (a) to (e). This includes issues related to the clarity of U.S.-approved procedures to help pilots prevent the 737 MAX from over-reacting to data loss and the methods for training them. Additionally, questions were raised about how well pilots are prepared for automatic responses from the system and how much time they have to react [77758].

IoT System Layer

Layer Option Rationale
Perception sensor, processing_unit, embedded_software (a) sensor: The article mentions a faulty sensor for the 'angle of attack' as a potential maintenance problem that could have contributed to the crash of the 737 MAX jet [Article 77758]. (c) processing_unit: The article discusses a system designed to deal with the accident scenario that was not described in the flight manual, indicating a potential issue with the processing unit or software logic [Article 77758]. (e) embedded_software: The article highlights that information recovered from the jet's data recorder led the FAA to issue an emergency directive warning pilots about a computer on the 737 MAX that could force the plane to descend sharply for up to 10 seconds even in manual flight, suggesting a potential issue with the embedded software controlling the aircraft's behavior [Article 77758].
Communication unknown Unknown
Application TRUE The failure mentioned in the article is related to the application layer of the cyber physical system. The article discusses the need for software or design changes to the 737 MAX jets following the deadly Lion Air crash in Indonesia. It mentions that Indonesian investigators found that a system designed to deal with the accident scenario was not described in the flight manual, and U.S. pilot unions were not aware of the new anti-stall system. Additionally, the FAA issued an emergency directive warning pilots about a computer on the 737 MAX that could force the plane to descend sharply for up to 10 seconds even in manual flight, raising questions about pilot preparedness for such automated responses [77758].

Other Details

Category Option Rationale
Consequence death, harm, property (a) death: The consequence of the software failure incident in this case was the loss of all 189 people on board the Lion Air jet that crashed into the sea [77758].
Domain transportation (a) The failed system in the incident was related to the transportation industry, specifically the aviation sector. The software or design changes being evaluated by the U.S. Federal Aviation Administration and Boeing Co were in response to the deadly Lion Air crash involving the 737 MAX jets [Article 77758]. The incident highlighted issues with the aircraft's systems and the need for more training for pilots operating the 737 MAX jets.

Sources

Back to List