Incident: Starliner's Valve Issue Causes Delay in NASA Mission.

Published Date: 2021-08-13

Postmortem Analysis
Timeline 1. The software failure incident with Boeing's Starliner spacecraft happened in December 2019 [Article 119549]. 2. The incident occurred in December 2019.
System 1. Starliner's fuel valves supplied by Aerojet Rocketdyne [119549] 2. Valves controlling the flow of nitrogen tetroxide in Starliner's propulsion system [117456]
Responsible Organization 1. Boeing was responsible for causing the software failure incident with its Starliner spacecraft [119549, 117456].
Impacted Organization 1. Boeing [119549, 117456]
Software Causes 1. Boeing's first attempt to launch an uncrewed Starliner in December 2019 failed to reach the space station because of dozens of software glitches [119549]. 2. The Starliner launch in August 2021 was called off due to 13 valves used in Starliner’s propulsion system failing to open during the countdown, indicating a software issue [117456].
Non-software Causes 1. Moisture accumulation near Teflon seals of the valves causing corrosion and stiction [117456, 119549] 2. Interaction of nitrogen tetroxide with moisture to produce nitric acid, leading to corrosion of the valves [117456]
Impacts 1. Financial losses were added to Boeing's balance sheet due to the software failure incident with the Starliner spacecraft [117456]. 2. Boeing had to bear the cost of the delays caused by the software failure incident [119549]. 3. The software failure incident led to the postponement of Boeing's first flight with astronauts aboard [117456].
Preventions 1. Improved testing procedures during the software development phase could have potentially prevented the software failure incident [119549]. 2. Enhanced monitoring and validation of software functionality prior to launch could have helped identify and address software glitches before they caused a failure [119549]. 3. Implementing more rigorous software quality assurance processes, including thorough code reviews and testing, might have mitigated the risk of software issues leading to a failure [117456].
Fixes 1. Conducting a forensic CT scan on two of the valves to identify the root cause of the issue [119549]. 2. Making corrections to the spacecraft and the internal safety culture of the Starliner team as mandated by NASA [119549]. 3. Retooling how the spacecraft's readiness is examined for future flights, potentially by loading propellant closer to launch or finding new ways to mitigate moisture [119549]. 4. Figuring out what needs to be taken apart to fix the valves and addressing any design or manufacturing differences that may have led to the issue [117456].
References 1. John Vollmer, the manager of Boeing’s commercial crew operations [Article 119549] 2. George Nield, a panel member and the former head of the Federal Aviation Administration’s commercial space transportation office [Article 119549] 3. Kathy Lueders, NASA’s associate administrator for human exploration and operations [Article 117456]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident related to Boeing's Starliner spacecraft has happened again within the same organization. The article mentions that Boeing's first attempt to launch an uncrewed Starliner in December 2019 failed to reach the space station because of dozens of software glitches [119549]. This incident led to Boeing making roughly 80 corrections to both the spacecraft and the Starliner team's internal safety culture. Additionally, the article highlights that the hardware, including the valves, operated nearly flawlessly during the abbreviated 2019 trip, indicating that the software issues were a significant factor in the failure. (b) There is no specific mention in the articles of a similar software failure incident happening at other organizations or with their products and services.
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase is evident in the articles. In Article 117456, it is mentioned that during the first launch of Starliner in December 2019, the spacecraft was bedeviled by major software flaws that prevented it from reaching the space station. This led Boeing and NASA to decide that a do-over was necessary before certifying that Starliner is ready to carry astronauts. The major software flaws encountered during the initial launch indicate a failure related to the design phase of the software [117456]. (b) The software failure incident related to the operation phase is also highlighted in the articles. In Article 117456, it is described that during the countdown for the Starliner launch, 13 valves used in the spacecraft's propulsion system failed to open, leading to the launch being called off. Engineers were able to get nine of the 13 valves working, but four remained stuck, ultimately resulting in the decision to return the spacecraft to the factory. This operational failure, where valves failed to open during the countdown, points to issues introduced during the operation phase of the system [117456].
Boundary (Internal/External) within_system (a) The software failure incident related to the Boeing Starliner spacecraft can be categorized as within_system. The articles [119549, 117456] mention that the software glitches and flaws that led to the failure were internal to the spacecraft's systems. The articles highlight that the software issues, including major software flaws that prevented the spacecraft from reaching the space station in the past, were among the contributing factors that originated from within the system itself. Additionally, the articles discuss how engineers had to make corrections to the software and internal safety culture of the Starliner team to address the software-related issues before attempting another launch.
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident occurring due to non-human actions: - In the incident involving Boeing's Starliner spacecraft, the software failure was not directly attributed to human actions. The failure was primarily related to sticky valves in the spacecraft's propulsion system, which caused issues with the flow of nitrogen tetroxide propellant used by Starliner's thrusters [117456]. - The valves in question were part of the spacecraft's propulsion system and were designed to control the flow of propellant. The issue arose from a chemical reaction between nitrogen tetroxide and moisture, leading to corrosion and causing the valves to become stuck [117456]. - Engineers were investigating the root cause of the valve problem, considering factors such as weather conditions during the launch attempt and potential manufacturing variations in the valves. The issue was not directly linked to human actions or errors [117456]. (b) The software failure incident occurring due to human actions: - The software failure incident in the past involving Boeing's Starliner spacecraft was attributed to major software flaws that prevented the spacecraft from reaching the International Space Station during its first uncrewed flight in December 2019 [119549]. - The 2019 software glitches led to a situation where the spacecraft would have faced a catastrophic failure if engineers had not been able to quickly correct some of the software issues while the spacecraft was in orbit [119549]. - Following the 2019 software failure incident, Boeing made approximately 80 corrections to both the spacecraft and the internal safety culture of the Starliner team as mandated by NASA. This indicates that human actions, such as software development and testing processes, played a significant role in the software failure incident during the 2019 launch attempt [119549].
Dimension (Hardware/Software) hardware, software (a) The software failure incident occurring due to hardware: - The articles mention that the Starliner spacecraft faced issues with sticky valves that control the flow of nitrogen tetroxide, a propellant used by Starliner's thrusters. Some of the nitrogen tetroxide permeated through Teflon seals and interacted with moisture on the other side, resulting in nitric acid production and corrosion, leading to the valves becoming stuck [Article 117456]. - Engineers were investigating the hardware issue with the valves, considering factors such as weather conditions, including a thunderstorm, and potential manufacturing differences that may have contributed to the problem [Article 117456]. (b) The software failure incident occurring due to software: - In a previous launch attempt in December 2019, the Starliner spacecraft faced major software flaws that prevented it from reaching the space station. This led to the decision for a do-over before certifying Starliner to carry astronauts [Article 117456]. - The 2019 launch was bedeviled by software glitches, with dozens of software glitches causing the spacecraft to fail to reach the space station. Engineers had to make roughly 80 corrections to both the spacecraft and the internal safety culture of the Starliner team to address the software issues [Article 119549].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident related to the Boeing Starliner spacecraft does not appear to be malicious. The incidents mentioned in the articles point to technical issues and failures in the spacecraft's systems, such as sticky valves in the propulsion system [117456], [119549]. These issues led to delays in the launch schedule and the need for the spacecraft to be returned to the factory for further investigation and repairs. There is no indication in the articles that the failures were caused by malicious intent. (b) The software failure incident related to the Boeing Starliner spacecraft is non-malicious. The failures and delays in the launch of the Starliner spacecraft were attributed to technical issues with the spacecraft's systems, specifically the sticky valves in the propulsion system [117456], [119549]. These technical issues were not caused by any malicious intent but rather by factors related to the design, manufacturing, or environmental conditions affecting the spacecraft.
Intent (Poor/Accidental Decisions) poor_decisions, accidental_decisions (a) The intent of the software failure incident related to poor decisions can be inferred from the articles. In Article 119549, it is mentioned that Boeing's first attempt to launch an uncrewed Starliner in December 2019 failed to reach the space station because of dozens of software glitches. The article further states that Starliner would have suffered a catastrophic failure if engineers had not been able to quickly correct some of the software issues. This indicates that the failure was partly due to poor decisions or factors introduced by poor decisions during the software development process [119549]. (b) The intent of the software failure incident related to accidental decisions can be inferred from the articles. In Article 117456, it is mentioned that during troubleshooting after a mishap at the space station, 13 valves used in Starliner's propulsion system failed to open, leading to the launch being called off. Engineers were able to get nine of the 13 valves working, but four remained stuck. The article also mentions that the problem occurred among 24 valves that control the flow of nitrogen tetroxide, and some of the nitrogen tetroxide appears to have permeated through Teflon seals and interacted with moisture on the other side to produce nitric acid, resulting in corrosion and stiction of the valves. This indicates that the failure was partly due to accidental decisions or mistakes made during the operation or maintenance of the software [117456].
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident related to development incompetence is evident in the articles. In Article 119549, it is mentioned that Boeing's first attempt to launch an uncrewed Starliner in December 2019 failed to reach the space station because of dozens of software glitches. The article further states that Starliner would have suffered a catastrophic failure if engineers had not been able to quickly correct some of the software issues. This incident led to Boeing making roughly 80 corrections to both the spacecraft and the Starliner team's internal safety culture as mandated by NASA [119549]. (b) The software failure incident related to accidental factors is also highlighted in the articles. In Article 117456, it is mentioned that during the troubleshooting that followed the failed launch attempt, engineers were able to get nine out of 13 valves working, but four remained stuck. The issue with the valves was attributed to some of the nitrogen tetroxide permeating through Teflon seals and interacting with moisture on the other side to produce nitric acid, leading to corrosion and the valves getting stuck. This was described as a problem that hadn't been seen in the past and needed further investigation to determine the cause, which could potentially include accidental factors like weather conditions or manufacturing variations [117456].
Duration temporary (a) The software failure incident related to the Boeing Starliner spacecraft can be considered as a temporary failure. The articles mention that the software glitches experienced during the first uncrewed Starliner launch in December 2019 were major and required corrections while the spacecraft was in orbit [119549]. This incident led to a decision by Boeing and NASA to conduct another uncrewed test before certifying the spacecraft for crewed missions. Additionally, the articles highlight that the hardware, including the valves that caused the recent delay, operated nearly flawlessly during the abbreviated 2019 trip [117456]. This indicates that the software failure incident was not permanent but rather a temporary issue that needed to be addressed through further testing and corrections.
Behaviour other (a) crash: The software failure incident in the articles did not involve a crash where the system lost state and did not perform any of its intended functions. The incidents were related to hardware issues with sticky valves causing delays in the launch of Boeing's Starliner spacecraft [119549, 117456]. (b) omission: The software failure incident did not involve the system omitting to perform its intended functions at an instance(s). The issues were related to hardware problems with the valves in the spacecraft's propulsion system [119549, 117456]. (c) timing: The software failure incident did not involve the system performing its intended functions correctly but too late or too early. The delays in the launch were due to hardware issues with the valves, not timing-related software failures [119549, 117456]. (d) value: The software failure incident did not involve the system performing its intended functions incorrectly due to software issues. The problems were related to hardware malfunctions with the valves in the spacecraft's propulsion system [119549, 117456]. (e) byzantine: The software failure incident did not involve the system behaving erroneously with inconsistent responses and interactions. The issues were related to hardware problems with the valves in the spacecraft's propulsion system, not byzantine behavior of the software [119549, 117456]. (f) other: The behavior of the software failure incident was related to hardware issues with sticky valves in the spacecraft's propulsion system, leading to delays in the launch of Boeing's Starliner spacecraft. The software itself did not exhibit any specific failure behavior as described in options (a) to (e) [119549, 117456].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property, delay, non-human, theoretical_consequence (a) death: There were no reports of people losing their lives due to the software failure incident described in the articles [119549, 117456]. (b) harm: There were no reports of people being physically harmed due to the software failure incident described in the articles [119549, 117456]. (c) basic: There were no reports of people's access to food or shelter being impacted because of the software failure incident described in the articles [119549, 117456]. (d) property: The software failure incident impacted Boeing's financial losses and balance sheet due to delays and issues with the Starliner spacecraft [119549, 117456]. (e) delay: The software failure incident caused delays in the Starliner spacecraft's scheduled crewless demonstration flight, further postponing Boeing's first flight with astronauts aboard [119549, 117456]. (f) non-human: The software failure incident impacted the Starliner spacecraft and its components, leading to delays and recalls to the factory for investigation and repairs [119549, 117456]. (g) no_consequence: There were observed consequences of the software failure incident, including delays, financial losses, and impacts on the spacecraft, as described in the articles [119549, 117456]. (h) theoretical_consequence: There were discussions about potential consequences of the software failure incident, such as the need to retool how the spacecraft's readiness is examined for future flights, as suggested by a NASA safety panel [119549]. (i) other: There were no other consequences of the software failure incident described in the articles [119549, 117456].
Domain information, knowledge (a) The failed system was intended to support the information industry as it was related to the production and distribution of information. The software failure incident involved Boeing's Starliner spacecraft, which is designed to take NASA astronauts to and from the International Space Station [Article 117456]. (i) Additionally, the failed system was also related to the knowledge industry as it was part of NASA's Commercial Crew program aimed at stimulating private development of space capsules for ferrying astronauts to and from the International Space Station [Article 119549]. (m) The system failure incident was not related to any other industry mentioned in the options provided.

Sources

Back to List