Incident: Starliner Software Glitch Causes Orbit Failure and Mission Abortion

Published Date: 2019-12-20

Postmortem Analysis
Timeline 1. The software failure incident with the Starliner capsule happened in December 2019 [Article 117500]. 2. The software failure incident with the Starliner capsule also occurred in December 2019 [Article 117444]. 3. The software failure incident with the Starliner capsule took place in December 2019 [Article 96862]. 4. The software failure incident with the Starliner capsule happened in December 2019 [Article 127763]. 5. The software failure incident with the Starliner capsule occurred in December 2019 [Article 118256]. 6. The software failure incident with the Starliner capsule took place in December 2019 [Article 94455]. 7. The software failure incident with the Starliner capsule happened in December 2019 [Article 95390]. 8. The software failure incident with the Starliner capsule occurred in December 2019 [Article 93227]. 9. The software failure incident with the Starliner capsule happened in December 2019 [Article 93145]. 10. The software failure incident with the Starliner capsule occurred in December 2019 [Article 93048].
System 1. Starliner spacecraft's software system [93048, 93145, 93227, 117444, 117500, 118256, 128026]
Responsible Organization 1. Boeing [93048, 93145, 93227, 96862, 117444, 117500, 118256] 2. NASA [93048, 93145, 94455, 117444, 117500, 118256] 3. Atlas 5 rocket [117444] 4. ULA [128026]
Impacted Organization 1. Boeing [93048, 93145, 93227, 96862, 117444, 117500, 118256, 128026] 2. NASA [93145, 94455, 117444, 117500, 118256]
Software Causes 1. The software glitch caused the spacecraft to launch into the wrong orbit, leading to the mission's abort [Article 93048]. 2. The spacecraft's computer failed to fire the engine to push it into the correct orbit due to a software issue [Article 93145]. 3. A timer aboard the spacecraft mistakenly thought the orbital insertion burn had taken place, leading to other thrusters firing incorrectly due to a software issue [Article 93145]. 4. A software problem caused the capsule to fail to attain the orbit needed to rendezvous with the space station [Article 93227]. 5. A timer glitch and other software issues were being investigated by NASA to determine the root cause of the failure [Article 94455]. 6. Lapses in software testing allowed software errors to slip through undetected and unfixed before the spacecraft launched [Article 96862]. 7. A software error caused the spacecraft to set its clock to the wrong time, depleting its propellant and leading to a shortened mission [Article 117444]. 8. Software errors led to the spacecraft's thrusters over-firing, burning excessive fuel and preventing the intended docking with the space station [Article 128026].
Non-software Causes 1. The Starliner capsule's main engine did not fire as scheduled, potentially due to a timer misfiring [93145]. 2. The spacecraft's clock was set to the wrong time, causing it to deplete its propellant and leading to the cancellation of the planned docking at the space station [117444, 127533]. 3. Two thrusters failed during a maneuver to put Starliner in a stable orbit, prompting the spacecraft to automatically adjust with its remaining thrusters [127533]. 4. The onboard computer system over-fired Starliner's thrusters, burning excessive fuel and making it impossible to reach the intended destination of docking with the space station [128026].
Impacts 1. The software glitch caused the Starliner spacecraft to launch into the wrong orbit, leading to the mission being aborted and the spacecraft being routed back to Earth instead of continuing to the ISS [93048]. 2. The failure of the spacecraft's computer to fire the engine to push it into the correct orbit, along with other miscues, resulted in the spacecraft burning through a significant amount of fuel, leading to the decision to abort the mission to the space station [93145]. 3. The software problem caused the capsule to fail to attain the orbit needed to rendezvous with the space station, marking another engineering black eye for Boeing [93227]. 4. The software errors that slipped through undetected and unfixed before the spacecraft launched in December could have led to a catastrophic failure during the uncrewed test flight of the Starliner spacecraft [96862]. 5. The software error that could have resulted in the loss of the spacecraft was discovered and fixed while the capsule was in orbit, narrowly avoiding a catastrophic failure [117444]. 6. The software glitch caused the Starliner to fail to dock at the ISS and return to Earth prematurely during the first uncrewed test flight, highlighting the need for corrective actions and improvements [117500]. 7. The software errors led to the spacecraft burning excessive fuel, making it impossible to reach its intended destination for docking with the space station, and also posed a risk of collision during re-entry, which was fortunately prevented [128026].
Preventions 1. Improved software verification process before launch [Article 99015] 2. Closer oversight of software development by NASA [Article 118256] 3. Implementing recommendations for fixes and improvements from review teams [Article 96862, Article 118256]
Fixes 1. Conduct an independent investigation to determine the root cause of the software glitch and any other software issues [Article 94455, Article 95390]. 2. Implement corrective actions based on the findings of the investigation team before flying a crew of astronauts for the first time [Article 94455]. 3. Make fixes and improvements based on the 61 recommendations provided by the review team [Article 96862]. 4. Oversee software development more closely at both SpaceX and Boeing [Article 118256].
References 1. Boeing spokesperson Gordon Johndroe [Article 93145] 2. Todd Curtis, aviation safety analyst for AirSafe.com and former Boeing engineer [Article 93145] 3. NASA Administrator Jim Bridenstine [Article 93145] 4. Jim Chilton, Boeing's senior vice president for space and launch [Article 93145] 5. NASA [Article 94455, Article 95390] 6. NASA safety review panel [Article 99015] 7. NASA Aerospace Safety Advisory Panel [Article 117444] 8. NASA report [Article 117500] 9. ULA [Article 128026]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident having happened again at one_organization: - Boeing faced another software failure incident with its Starliner spacecraft, where a software glitch caused the spacecraft to launch into the wrong orbit, leading to the cancellation of its mission to the International Space Station [93048, 93145]. - Boeing's Starliner spacecraft experienced software errors that were not detected and fixed before launch, leading to issues such as the spacecraft squandering its propellant and a failed docking at the ISS [117444, 118256]. (b) The software failure incident having happened again at multiple_organization: - The article mentions that Boeing's software problems with the Starliner spacecraft could raise questions about shared approaches to design, testing, and evaluation across different divisions of the company, hinting at potential shared problems with software at Boeing and possibly other organizations [93145].
Phase (Design/Operation) design, operation (a) In the software failure incident related to the Boeing Starliner spacecraft, there were issues identified in both the design and operation phases: Design: - The incident highlighted lapses that allowed software errors to slip through undetected and unfixed before the spacecraft launched [Article 96862]. - NASA officials admitted that decades of working with Boeing gave them a level of trust, but they did not fully understand how Boeing was designing the Starliner's software and the testing process for verifying its functionality [Article 118256]. Operation: - A software problem caused the capsule to fail to attain the orbit needed to rendezvous with the space station [Article 93227]. - The spacecraft experienced a timing anomaly shortly before it was due to dock with the ISS, which required ground intervention to prevent the loss of the vehicle [Article 117500]. - Engineers are investigating what went wrong after two thrusters failed during a maneuver to put Starliner in a stable orbit, but the spacecraft was able to automatically adjust with its remaining thrusters [Article 127533].
Boundary (Internal/External) within_system (a) within_system: The software failure incident related to the Starliner spacecraft's incorrect orbit and subsequent issues during its flight was primarily attributed to software errors within the system. The incident involved a software glitch causing the spacecraft to launch into the wrong orbit, a timer error leading to thrusters firing incorrectly, and a valve mapping software issue that could have caused problems during re-entry [93048, 93145, 117444, 117500, 118256, 127533]. (b) outside_system: The articles do not provide specific information indicating that the software failure incident was primarily due to contributing factors originating from outside the system.
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident occurring due to non-human actions: - The software glitch caused the Starliner spacecraft to launch into the wrong orbit, prompting officials to route it back to Earth instead of continuing to the ISS [Article 93048]. - A software problem caused the capsule to fail to attain the orbit needed to rendezvous with the space station [Article 93227]. - A software error that could have resulted in the loss of the spacecraft was discovered and fixed while the Starliner capsule was in orbit, just before it returned to Earth [Article 117444]. - An error and another software problem could have caused the spacecraft to be lost if not detected in time, with ground intervention preventing the loss of the vehicle [Article 117500]. - The onboard computer system over-fired Starliner's thrusters due to a software issue, burning too much fuel and making it impossible to reach the intended destination of docking with the space station [Article 128026]. (b) The software failure incident occurring due to human actions: - Boeing engineers discovered a second flaw that would have caused the wrong thrusters to fire as the capsule prepared for re-entry, potentially leading to the destruction of the spacecraft [Article 127533]. - NASA and Boeing admitted that decades of working together gave them a level of trust, but a lack of knowledge about how Boeing was designing the Starliner's software and testing processes led to software errors slipping through undetected before launch [Article 118256].
Dimension (Hardware/Software) hardware, software (a) The software failure incident occurring due to hardware: - The software glitch that caused the Starliner spacecraft to launch into the wrong orbit was attributed to a timer aboard the spacecraft mistakenly thinking that the "orbital insertion burn" had taken place, leading to other thrusters firing to keep the spacecraft on course. This hardware-related issue contributed to the software failure incident [Article 93145]. - Another hardware-related issue was the spacecraft's clock being set to the wrong time, causing the onboard computer to try to move the spacecraft to where it thought it should be, leading to the firing of thrusters that used up much propellant and the cancellation of the docking at the space station [Article 127533]. (b) The software failure incident occurring due to software: - The software glitch that led to the Starliner spacecraft failing to attain the correct orbit for rendezvous with the space station was primarily attributed to a software problem that caused the capsule to fail to reach the required orbit [Article 93227]. - The review by NASA and Boeing identified lapses that allowed software errors to slip through undetected and unfixed before the spacecraft launched, indicating a software-related issue contributing to the failure incident [Article 96862]. - The software errors that slipped through undetected and were not fixed before the spacecraft was launched in December were mentioned as contributing to the troubled test flight of the Boeing spacecraft, highlighting a software-related issue [Article 118256].
Objective (Malicious/Non-malicious) non-malicious (a) The articles do not provide any information indicating that the software failure incident was malicious in nature. [93048, 93145, 93227, 96862, 99015, 117444, 117500, 118256, 127533, 128026] (b) The software failure incidents reported in the articles were non-malicious in nature. The failures were attributed to software glitches, errors, and flaws that occurred unintentionally during the spacecraft's operations. These incidents were a result of technical issues and lapses in the software development and testing processes. [93048, 93145, 93227, 96862, 99015, 117444, 117500, 118256, 127533, 128026]
Intent (Poor/Accidental Decisions) poor_decisions, accidental_decisions (a) The intent of the software failure incident related to poor decisions can be seen in the software failure incident involving the Boeing Starliner spacecraft. The incident was a result of lapses that allowed software errors to slip through undetected and unfixed before the spacecraft launched, indicating a failure due to contributing factors introduced by poor decisions [Article 96862]. (b) The intent of the software failure incident related to accidental decisions can be observed in the software glitch that caused the Starliner capsule to fail to attain the orbit needed to rendezvous with the space station. This incident was described as an unwelcome engineering black eye for Boeing, suggesting a failure due to contributing factors introduced by mistakes or unintended decisions [Article 93227].
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The articles provide information related to the software failure incident occurring due to development incompetence. For example, in Article 117444, it is mentioned that a software error that could have resulted in the loss of the spacecraft was discovered and fixed while the capsule was in orbit, indicating a lapse in the software testing procedures that allowed errors to slip through undetected [117444]. Additionally, Article 118256 highlights that NASA officials admitted that decades of working with Boeing gave them a level of trust, but a lack of understanding of how Boeing was designing the software and testing processes led to the software errors not being fixed before launch [118256]. (b) The articles also mention the software failure incident occurring accidentally. For instance, in Article 117500, it is stated that a software glitch and a possible lapse in human attention were to blame for throwing the International Space Station out of control, indicating an accidental introduction of contributing factors that led to the software failure [117500]. Additionally, Article 127533 describes how the onboard computer tried to move the spacecraft to where it thought it should be due to the clock being set to the wrong time, leading to the firing of thrusters and the depletion of propellant, which was an accidental introduction of contributing factors [127533].
Duration temporary (a) The software failure incident related to the Starliner spacecraft was temporary. The incident was caused by a software glitch that led to the spacecraft launching into the wrong orbit, prompting officials to route it back to Earth instead of continuing on to the ISS. Boeing officials mentioned that they were still trying to figure out how the timing error occurred, indicating that the failure was not permanent [93048, 93145]. (b) The incident also involved a timer aboard the spacecraft mistakenly thinking that the "orbital insertion burn" had taken place and ordering other thrusters to fire to keep the spacecraft on the correct trajectory. This shows that the failure was temporary and not a permanent issue introduced by all circumstances [93145].
Behaviour crash, omission, timing, value, other (a) crash: The software failure incident in the articles can be categorized as a crash due to the system losing state and not performing any of its intended functions. This is evident from the incident where the spacecraft failed to achieve the correct orbit, leading to the cancellation of the mission to the International Space Station [93145]. (b) omission: The software failure incident can also be categorized as an omission due to the system omitting to perform its intended functions at an instance(s). For example, a timer aboard the spacecraft mistakenly thought that an "orbital insertion burn" had taken place and ordered other thrusters to fire to keep the spacecraft on the correct trajectory, even though the burn had not occurred [93145]. (c) timing: The software failure incident can be categorized as a timing issue due to the system performing its intended functions correctly, but either too late or too early. This is evident from the incident where the spacecraft set its clock to the wrong time, causing it to deplete its propellant and leading to the mission being cut short [117444]. (d) value: The software failure incident can be categorized as a value issue due to the system performing its intended functions incorrectly. An example of this is when a software problem caused the capsule to fail to attain the orbit needed to rendezvous with the space station [93227]. (e) byzantine: The software failure incident does not exhibit characteristics of a byzantine failure, which involves the system behaving erroneously with inconsistent responses and interactions. (f) other: The software failure incident can be categorized as other due to the system behaving in a way not described in the options (a to e). An example of this is when a software glitch caused the spacecraft to launch into the wrong orbit, prompting officials to route it back to Earth instead of continuing on to the ISS [93048].

IoT System Layer

Layer Option Rationale
Perception sensor, actuator, processing_unit, embedded_software (a) sensor: Failure due to contributing factors introduced by sensor error - Article 117444 mentions a timing anomaly that occurred before docking with the ISS during a test flight, which could have caused the spacecraft to be lost if not detected in time. Ground intervention prevented the loss of the vehicle, indicating a sensor error [117444]. - Article 127533 discusses a serious clock error that caused the onboard computer to try to move the spacecraft to where it thought it should be, leading to the firing of thrusters and the depletion of propellant. This points to a sensor error related to the clock setting [127533]. (b) actuator: Failure due to contributing factors introduced by actuator error - Article 128026 describes how poorly designed software could have resulted in the capsule colliding with its aft service section during separation just prior to re-entry. This indicates a potential actuator error in the separation mechanism [128026]. (c) processing_unit: Failure due to contributing factors introduced by processing error - Article 93145 mentions a timer aboard the spacecraft mistakenly thinking that an orbital insertion burn had taken place, leading to the firing of other thrusters. This points to a processing error related to the timer function [93145]. - Article 117444 discusses a valve mapping software issue that could have caused the wrong thrusters to fire during re-entry preparations, indicating a processing error in the software [117444]. (d) network_communication: Failure due to contributing factors introduced by network communication error - No specific mention of network communication errors was found in the provided articles. (e) embedded_software: Failure due to contributing factors introduced by embedded software error - Article 117444 discusses a valve mapping software issue that could have caused problems during re-entry preparations, indicating a potential embedded software error [117444]. - Article 128026 mentions poorly designed software that could have resulted in a capsule collision during separation, pointing to a potential embedded software error [128026].
Communication link_level, connectivity_level From the provided articles, there is information related to both the link_level and connectivity_level failures in the cyber-physical system: - **link_level**: The failure of the Starliner capsule's main engine not firing as scheduled was attributed to a timer aboard the spacecraft mistakenly thinking that the "orbital insertion burn" had taken place, leading to other thrusters firing incorrectly. This issue was related to a timer misfiring, indicating a failure at the link_level of the cyber-physical system [93145, 117444]. - **connectivity_level**: Another software problem caused the capsule to fail to attain the orbit needed to rendezvous with the space station. This issue could be related to a failure at the network or transport layer of the cyber-physical system, impacting the communication and coordination between the spacecraft and the ground controllers [93227, 128026].
Application TRUE The software failure incident related to the application layer of the cyber physical system that failed is described in the following articles: 1. Article 117444 mentions a software error that could have resulted in the loss of the spacecraft being discovered and fixed while the capsule was in orbit, just before it returned to Earth. This error, along with another software problem, could have caused the spacecraft to be lost if not detected and corrected in time, indicating issues at the application layer [117444]. 2. Article 128026 discusses how the onboard computer system over-fired Starliner's thrusters due to a software issue, leading to excessive fuel consumption and the inability to reach the intended destination of docking with the space station. Additionally, poorly designed software could have resulted in a potential collision between the capsule and its aft service section during re-entry, highlighting application layer issues [128026].

Other Details

Category Option Rationale
Consequence delay, non-human, theoretical_consequence, other (a) death: There were no reports of people losing their lives due to the software failure incident in the articles. (b) harm: The software failure incident did not result in physical harm to individuals as there were no crew members on board the spacecraft during the failed mission [93145]. (c) basic: The software failure did not impact people's access to food or shelter as per the articles. (d) property: The software failure incident did not directly impact people's material goods, money, or data as per the articles. (e) delay: The software failure incident did lead to a delay in the mission to the International Space Station as the spacecraft had to be brought back to Earth instead of continuing to the ISS [93048, 93145]. (f) non-human: The software failure incident impacted the spacecraft and its operations, leading to the need for corrective actions and redesigns [117500, 128026]. (g) no_consequence: There were observed consequences of the software failure incident, including the mission being aborted and the spacecraft not reaching the correct orbit [93048, 93145, 93227, 117444, 117500, 118256, 127533]. (h) theoretical_consequence: There were potential consequences discussed, such as the possibility of catastrophic failure and loss of the spacecraft if the software errors had not been detected and corrected in time [99015, 117444, 127533]. (i) other: The software failure incident led to a loss of the spacecraft's mission to the International Space Station, raised questions about Boeing's procedures, and highlighted the need for improvements in software testing and development processes [93145, 96862, 117444, 118256].
Domain transportation, knowledge, other (a) The failed system was related to the transportation industry, specifically in the context of space travel and NASA missions. The articles mention Boeing's spacecraft designed to fly NASA astronauts to space failing to achieve the correct orbit, leading to the cancellation of its mission to the International Space Station [93145]. There were issues with the spacecraft's computer failing to fire the engine to push it into the correct orbit, as well as problems with the timer mistakenly ordering other thrusters to fire, causing complications in the mission [93145]. Additionally, there were software problems that could have caused the spacecraft to be lost if not detected in time, highlighting the critical role of software in space transportation [117500]. (i) The failed system was also related to the knowledge industry, particularly in the field of space exploration. NASA's collaboration with Boeing and SpaceX to provide transportation to and from the International Space Station reflects the involvement of these companies in advancing human spaceflight programs [96862]. The articles discuss NASA's efforts to reduce bureaucratic overhead and the need for a better understanding of how Boeing was designing the software for the Starliner spacecraft, emphasizing the importance of software development in space exploration [118256]. (m) The failed system could be categorized under the "other" industry as it pertains to the aerospace industry, which involves the design, development, and production of aircraft and spacecraft. Boeing's involvement in building space taxis to ferry astronauts to the space station under NASA's program highlights its role in the aerospace sector [99016]. The articles discuss technical problems, setbacks, and the financial impact on Boeing in the context of space transportation, further emphasizing the aerospace industry's connection to the software failure incident [127533].

Sources

Back to List