Incident: International Space Station Out-of-Control Incident Due to Software Glitch

Published Date: 2021-08-01

Postmortem Analysis
Timeline 1. The software failure incident happened on Thursday, a few hours after docking, as reported in the article published on August 1, 2021 [117271]. Therefore, the incident occurred on July 29, 2021.
System The software failure incident on the International Space Station was attributed to a software glitch and a possible lapse in human attention. The systems that failed in this incident were: 1. Software system on the Nauka (Science) Multipurpose Laboratory Module [117271].
Responsible Organization 1. Russian space officials [117271] 2. Possible lapse in human attention [117271]
Impacted Organization 1. International Space Station (ISS) crew members [117271]
Software Causes 1. A software glitch was identified as one of the causes of the failure incident on the International Space Station [117271].
Non-software Causes 1. Possible lapse in human attention [117271] 2. Mishap caused by a software glitch [117271]
Impacts 1. The software glitch and possible lapse in human attention caused the International Space Station to briefly pitch out of its normal flight position, 250 miles above the Earth, with seven crew members aboard [117271].
Preventions 1. Ensuring thorough software testing and quality assurance procedures before deploying the module into space could have potentially prevented the software glitch that caused the International Space Station to pitch out of its normal flight position [117271].
Fixes 1. Conduct a thorough review and update of the software code to identify and fix the glitch that caused the software failure incident [117271].
References 1. Russian space officials 2. NASA 3. Russian space agency Roscosmos [117271]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The article mentions that Roscosmos, the Russian space agency, has suffered a series of mishaps and corruption scandals, including the recent software glitch that briefly threw the International Space Station out of control. This indicates that similar incidents have happened before within the same organization [117271]. (b) The article does not provide information about similar incidents happening at other organizations or with their products and services.
Phase (Design/Operation) design, operation (a) The software failure incident on the International Space Station was attributed to a software glitch and a possible lapse in human attention. This indicates that the failure was due to contributing factors introduced during the system development or system updates [117271]. (b) The operation phase also played a role in the incident as the mishap occurred a few hours after docking, suggesting that the failure was influenced by the operation or misuse of the system [117271].
Boundary (Internal/External) within_system (a) The software failure incident on the International Space Station was within the system. Russian space officials attributed the mishap, which caused the entire space station to pitch out of its normal flight position, to a software glitch and a possible lapse in human attention [117271]. NASA's account of the incident also mentioned that engineers on the ground struggled to restore stability to the research satellite, indicating an internal system issue [117271].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident on the International Space Station was attributed to a software glitch and a possible lapse in human attention, indicating that non-human actions played a role in the failure [117271]. NASA's account mentioned that engineers on the ground had to work to restore stability to the space station after the incident occurred, further emphasizing the non-human aspect of the failure.
Dimension (Hardware/Software) software (a) The software failure incident on the International Space Station was attributed to a software glitch and a possible lapse in human attention, indicating contributing factors that originated in software [117271].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident related to the International Space Station (ISS) was non-malicious. Russian space officials attributed the mishap, which caused the entire space station to pitch out of its normal flight position, to a software glitch and a possible lapse in human attention [117271]. The incident was not caused by malicious intent but rather by technical issues and human error, highlighting the importance of robust software testing and human vigilance in critical systems like the ISS.
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident on the International Space Station was attributed to a software glitch and a possible lapse in human attention, indicating that poor decisions or contributing factors introduced by poor decisions played a role in the mishap [117271].
Capability (Incompetence/Accidental) development_incompetence, accidental The software failure incident on the International Space Station was attributed to a software glitch and a possible lapse in human attention, indicating a combination of factors related to development incompetence and accidental causes [117271]. The incident caused the entire space station to pitch out of its normal flight position, highlighting the critical impact of software failures in such high-stakes environments.
Duration temporary (a) The software failure incident on the International Space Station was temporary. It was caused by a software glitch and a possible lapse in human attention, which led to the space station briefly being thrown out of control after the Nauka module docked [117271]. The incident required engineers on the ground to work on restoring stability to the space station, and the crew members were declared to be never in immediate danger.
Behaviour crash, omission, other (a) crash: The software glitch caused the entire space station to pitch out of its normal flight position, indicating a failure due to the system losing state and not performing its intended functions [117271]. (b) omission: The software glitch and possible lapse in human attention were to blame for the mishap, suggesting a failure due to the system omitting to perform its intended functions at that instance [117271]. (c) timing: There is no specific mention of the failure being related to the system performing its intended functions but too late or too early. (d) value: The articles do not indicate that the failure was due to the system performing its intended functions incorrectly. (e) byzantine: The articles do not mention the failure being related to the system behaving erroneously with inconsistent responses and interactions. (f) other: The software failure incident involved a software glitch that led to the space station going out of control, which could be categorized as a system behavior not falling into the specific options mentioned [117271].

IoT System Layer

Layer Option Rationale
Perception embedded_software The software failure incident related to the International Space Station (ISS) was attributed to a software glitch and a possible lapse in human attention. The mishap caused the entire space station to pitch out of its normal flight position. The incident involved the Nauka (Science) Multipurpose Laboratory Module, and Russian space officials mentioned that the failure was due to a software glitch and a potential human error [117271]. This incident does not provide specific details indicating whether the failure was related to the perception layer of the cyber physical system, such as sensor, actuator, processing unit, network communication, or embedded software.
Communication link_level The software failure incident on the International Space Station (ISS) was attributed to a software glitch and a possible lapse in human attention, according to Russian space officials [117271]. This indicates that the failure was more likely related to the link_level, involving issues within the software controlling the module itself rather than connectivity issues at the network or transport layer.
Application TRUE The software failure incident related to the International Space Station (ISS) briefly being thrown out of control was attributed to a software glitch and a possible lapse in human attention [117271]. This aligns with the definition of a failure at the application layer of a cyber-physical system, as it involves contributing factors introduced by bugs or errors in the software system.

Other Details

Category Option Rationale
Consequence property, non-human The consequence of the software failure incident described in the article [117271] is as follows: (d) property: People's material goods, money, or data was impacted due to the software failure The software glitch and possible lapse in human attention caused the International Space Station to pitch out of its normal flight position, impacting the entire space station and its operations. This incident resulted in a disruption to the normal functioning and stability of the research satellite, which can be considered as an impact on property (the space station) due to the software failure.
Domain knowledge (a) The failed system was intended to support the industry of space exploration as it was related to the International Space Station (ISS) incident involving the Nauka module [117271].

Sources

Back to List