Incident: Boeing 787 Software Glitch Requires Regular Reboots to Prevent Catastrophic Failures

Published Date: 2020-04-03

Postmortem Analysis
Timeline 1. The software failure incident with Boeing's 787 airliners happened in January 2020 as per the article [98291].
System The system that failed in the software failure incident reported in Article 98291 is: 1. Boeing 787 common core system (CCS) - The operating system of the plane, specifically the common core system, failed to filter out stale data from key flight control displays, leading to potential catastrophic failure scenarios [98291].
Responsible Organization 1. The software failure incident in Boeing's 787 airliners was caused by a flaw in the internal plane computer system, specifically the common core system (CCS), which led to data clogging up the systems and feeding false information to pilots for various key metrics [98291].
Impacted Organization 1. Pilots: The software failure incident impacted pilots by providing incorrect data on airspeed, altitude, attitude, and engine operation on primary displays, making it harder for pilots to maintain safe flight and landing of the plane [98291].
Software Causes 1. The software failure incident in Boeing's 787 airliners was caused by a flaw in the common core system (CCS) software, which led to data clogging up the systems and feeding false information to pilots for key metrics such as airspeed, altitude, and attitude [98291]. 2. The flaw in the software prevented the CCS from filtering out stale data, affecting the common data network (CDN) that handles flight-critical data, resulting in potentially catastrophic failure scenarios [98291]. 3. The software issue also impacted the primary flight displays (PFDs) in the cockpit, leading to misleading primary attitude data, altitude, airspeed data, and engine operating indications for both pilots without annunciation of failure [98291].
Non-software Causes 1. The Boeing 787 airliners had to be turned off and on again every 51 days due to a directive from the US government to prevent potential catastrophic failure scenarios [98291]. 2. The flaw with the software caused data to accumulate in the systems and feed false information to pilots for key metrics such as airspeed, attitude, and altitude [98291]. 3. The flaw affected the pilots' ability to monitor the condition of the engine and could prevent the stall warning horn and overspeed horn from working properly [98291]. 4. The software glitch in the common core system (CCS) of the Boeing 787 didn't filter out stale data, affecting the common data network (CDN) that feeds data directly to the primary flight displays (PFDs) in the cockpit [98291]. 5. The FAA Immediate Adoptive Rule mandated the action to 787 operators that Boeing recommended, highlighting the severity of the flaw [98291]. 6. Boeing had a similar software issue in 2015 where 787s had to power down every 287 days due to a glitch identified during laboratory testing [98291]. 7. Another Boeing aircraft, the 737 MAX, was also facing technical issues with a flaw in an indicator light designed to warn of a malfunction in a system that helps raise and lower the plane's nose [98291].
Impacts 1. The software failure incident in Boeing's 787 airliners led to the accumulation of swathes of data in the software, causing incorrect data on airspeed, altitude, and attitude to be displayed to pilots, making it harder for them to maintain safe flight and landing of the plane [Article 98291]. 2. If the 787s were not turned off and on again every 51 days as mandated, potential consequences included the display of misleading primary attitude data, altitude, airspeed data, and engine operating indications for both pilots, without annunciation of failure, coupled with the loss of stall warning or over-speed warning [Article 98291]. 3. The flaw with the software caused data clogging up the systems and feeding false information to pilots for various key metrics, affecting their ability to monitor the condition of the engine and potentially preventing the stall warning horn and overspeed horn from working properly [Article 98291].
Preventions 1. Regularly turning off and on the Boeing 787 airliners at least every 51 days to prevent data accumulation and software glitches could have prevented the software failure incident [Article 98291]. 2. Implementing a more comprehensive fix for the common core system (CCS) software issue identified in the Boeing 787s could have prevented the failure incident [Article 98291]. 3. Conducting thorough testing and simulations to identify and address software issues before they lead to potential catastrophic failure scenarios could have prevented the incident [Article 98291].
Fixes 1. Boeing recommended mitigating action to 787 customers to correct the issue of delayed data transmission after many days of continuous airplane power on. A permanent fix will be issued in the coming months [98291]. 2. The FAA Immediate Adoptive Rule mandates the action to 787 operators that Boeing recommended, indicating a regulatory intervention to address the software failure incident [98291]. 3. The FAA directive calls on Boeing to find a more comprehensive fix for the software issue affecting the Boeing 787 airliners, suggesting a need for a long-term solution beyond the interim measure of regular power cycling [98291].
References 1. Federal Aviation Authority (FAA) [Article 98291] 2. Boeing spokesperson [Article 98291]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) In the provided articles, a similar software failure incident had happened again at Boeing with their 787 airliners. In 2015, Boeing had a software issue with the 787s where they were mandated to power down every 287 days to prevent potential failures. This incident was related to data transmission delays after many days of continuous airplane power on, similar to the recent directive requiring the planes to be turned off and on again every 51 days to prevent catastrophic failure scenarios [98291]. (b) The articles also mention another plane in the Boeing fleet, the 737 MAX, which had technical issues related to a malfunction indicator light. This incident involved a flaw in the indicator light system that helps raise and lower the plane's nose, which was turning on when it shouldn't. Boeing reported this issue to the FAA in January, indicating a software-related problem with the indicator system [98291].
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase is evident in the case of Boeing's 787 airliners. The issue stemmed from a flaw in the software that caused data to accumulate in the systems if the planes were left turned on for extended periods. This led to false information being fed to pilots for key metrics such as airspeed, attitude, and altitude, affecting the primary flight displays in the cockpit [98291]. (b) The software failure incident related to the operation phase is highlighted by the fact that the planes were often left turned on for weeks at a time during crew changes at airports or when plugged in overnight for cleaning and maintenance. This practice contributed to the accumulation of data in the software systems, leading to the feeding of false information to pilots for critical metrics [98291].
Boundary (Internal/External) within_system (a) The software failure incident related to Boeing's 787 airliners requiring to be turned off and on again every 51 days to prevent catastrophic failure scenarios is within the system. The issue is with the common core system (CCS) of the plane, which stops filtering out stale data from key flight control displays if the planes are left on for too long without cycling down. This leads to data clogging up the systems and feeding false information to pilots, affecting critical flight data such as airspeed, altitude, attitude, and engine operation [98291].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in the Boeing 787 airliners was due to non-human actions. The issue stemmed from the software not filtering out stale data, leading to data accumulation in the systems and feeding false information to pilots for key metrics like airspeed, altitude, and attitude [98291]. (b) The software failure incident was also influenced by human actions. The directive from the US government mandated that the planes be shut down regularly to prevent glitches in the on-board computer systems. Failure to follow this directive could lead to several potentially catastrophic failure scenarios, affecting critical flight data displayed to pilots [98291].
Dimension (Hardware/Software) hardware, software (a) The software failure incident related to hardware: - The article mentions a directive from the US government that will force Boeing's 787 airliners to be shut down regularly to prevent glitches in the on-board computer systems, which could lead to incorrect data on airspeed, altitude, and attitude being shown on primary displays, impacting the safe flight and landing of the plane [98291]. - The issue with the software sees data clogging up the systems and feeding false information to pilots for various key metrics such as airspeed, attitude, and altitude, which could all be wrong [98291]. (b) The software failure incident related to software: - The software glitch in the common core system (CCS) of the Boeing 787 causes data to accumulate in the systems and feed false information to pilots, affecting key flight control displays and the common data network (CDN) on the plane [98291]. - The flaw with the software means that the CCS doesn't filter out stale data, leading to misleading primary attitude data, altitude, airspeed data, and engine operating indications on the pilots' primary flight displays without annunciation of failure [98291].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident related to Boeing's 787 airliners requiring to be turned off and on again every 51 days to prevent catastrophic failure scenarios is non-malicious. The issue stemmed from a flaw in the software that caused data to accumulate in the systems, leading to false information being fed to pilots for key metrics like airspeed, altitude, and attitude. This was not an intentional act to harm the system but rather a technical glitch that needed to be addressed to ensure the safety of the aircraft [98291].
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident related to Boeing's 787 airliners requiring to be turned off and on again every 51 days to prevent catastrophic failure scenarios can be attributed to poor decisions. The directive from the US government mandates the regular shutdown of the planes to avoid glitches in the on-board computer systems. The flaw in the software causes data to accumulate in the systems, leading to false information being displayed to pilots on critical metrics such as airspeed, altitude, and attitude. This flaw was identified as a potential scenario by Boeing, and the FAA issued an Immediate Adoptive Rule to address the unsafe condition, emphasizing the significance of the risk presented by the software issue [98291].
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident related to the Boeing 787 airliners requiring to be turned off and on again every 51 days to prevent catastrophic failure scenarios can be attributed to development incompetence. The issue stemmed from a flaw in the software where data clogged up the systems, feeding false information to pilots for key metrics like airspeed, attitude, and altitude [98291]. This indicates a lack of professional competence in the development of the software system, leading to the need for regular power cycling to prevent critical failures. (b) The software failure incident can also be considered accidental as it was not a deliberate action but rather a consequence of the software flaw. The issue of data accumulation in the systems and providing incorrect information to pilots was not intentional but a result of the software glitch that affected the common core system (CCS) and the common data network (CDN) on the Boeing 787 airliners [98291]. This accidental introduction of contributing factors led to the need for the directive to power down the planes every 51 days to avoid potential catastrophic failure scenarios.
Duration temporary (a) The software failure incident described in the articles is temporary. The issue with the Boeing 787 airliners requiring to be turned off and on again every 51 days to prevent catastrophic failure scenarios is due to a specific software glitch that causes data to accumulate in the systems and feed false information to pilots for key metrics like airspeed, altitude, and attitude. Boeing identified this potential scenario and recommended mitigating actions to correct the issue, with a permanent fix planned for the future [98291].
Behaviour value, other (a) crash: The software failure incident described in the article is not related to a crash where the system loses state and does not perform any of its intended functions. Instead, the issue is related to the accumulation of data in the software systems of Boeing's 787 airliners, leading to incorrect information being displayed to pilots [98291]. (b) omission: The software failure incident is not due to the system omitting to perform its intended functions at an instance(s). The issue is more about the system providing misleading data to pilots, affecting critical flight parameters like airspeed, altitude, attitude, and engine operation [98291]. (c) timing: The failure is not due to the system performing its intended functions correctly but too late or too early. The issue is more about the software glitch causing incorrect data to be displayed to pilots, potentially leading to dangerous situations during flight [98291]. (d) value: The software failure incident is related to the system performing its intended functions incorrectly. The flaw in the software of Boeing's 787 airliners leads to the display of misleading primary attitude data, altitude, airspeed, and engine operating indications to pilots, which can have severe consequences during flight [98291]. (e) byzantine: The failure is not related to the system behaving erroneously with inconsistent responses and interactions. The issue with the software of Boeing's 787 airliners is more about the accumulation of data in the systems, feeding false information to pilots for critical flight metrics, without proper indication of failure [98291]. (f) other: The behavior of the software failure incident can be described as a gradual degradation of system performance due to the accumulation of data in the software systems of Boeing's 787 airliners. This leads to the display of incorrect information to pilots, impacting their ability to safely operate the aircraft [98291].

IoT System Layer

Layer Option Rationale
Perception processing_unit, network_communication, embedded_software (a) sensor: The software failure incident related to Boeing's 787 airliners was not directly linked to sensor errors but rather to the processing unit and network communication errors. The failure was due to data clogging up the systems and feeding false information to pilots for various key metrics such as airspeed, attitude, and altitude [98291]. (b) actuator: The articles did not mention any failure related to actuator errors in the software incident involving Boeing's 787 airliners. The focus was on the processing unit, network communication, and embedded software errors [98291]. (c) processing_unit: The software failure incident with Boeing's 787 airliners was primarily attributed to issues with the processing unit. The common core system (CCS) on the plane was not filtering out stale data, leading to incorrect information being displayed to pilots on primary flight displays (PFDs) [98291]. (d) network_communication: The failure in the software incident involving Boeing's 787 airliners was also related to network communication errors. The common data network (CDN) on the plane, which handles flight-critical data, was affected by the accumulation of data in the systems, leading to potential catastrophic failure scenarios [98291]. (e) embedded_software: The software failure incident with Boeing's 787 airliners can be linked to embedded software errors. The flaw in the common core system (CCS) and common data network (CDN) resulted in false information being fed to pilots due to data clogging up the systems, impacting critical metrics like airspeed, altitude, and attitude [98291].
Communication link_level The software failure incident reported in Article 98291 was related to the communication layer of the cyber physical system that failed. The failure was specifically related to the common data network (CDN) on the Boeing 787 airliners. The flaw in the software caused data to accumulate in the systems, feeding false information to pilots for various key metrics such as airspeed, attitude, and altitude [98291]. This issue affected the primary flight displays (PFDs) in the cockpit, which receive data directly from the CDN. The backup monitors, which receive data from a different source, were not affected by this software glitch.
Application TRUE The software failure incident described in the article [98291] is related to the application layer of the cyber physical system. The failure was caused by a flaw in the software that led to data accumulation in the systems, feeding false information to pilots for key metrics such as airspeed, attitude, and altitude. This issue was attributed to the common core system (CCS) not filtering out stale data, affecting the common data network (CDN) that feeds data directly to the primary flight displays (PFDs) in the cockpit. The flaw with the primary screens occurred without any annunciation of failure, indicating a failure at the application layer of the system.

Other Details

Category Option Rationale
Consequence death, harm (a) death: People lost their lives due to the software failure - The article mentions two fatal crashes involving the Boeing 737 MAX that resulted in the deaths of 346 people [Article 98291].
Domain transportation (a) The failed system was intended to support the transportation industry, specifically the aviation sector. The software failure incident involved Boeing's 787 airliners, which are used for air transportation [98291].

Sources

Back to List