Incident: NASA's Ingenuity Mars Helicopter Launch Delayed Due to Software Glitch

Published Date: 2021-04-13

Postmortem Analysis
Timeline 1. The software failure incident with NASA's Ingenuity helicopter project on Mars happened in April 2021 [113179].
System 1. Ingenuity's flight control software [113179]
Responsible Organization 1. NASA was responsible for causing the software failure incident with the Ingenuity helicopter project on Mars [113179].
Impacted Organization 1. NASA [113179]
Software Causes 1. The software failure incident in the NASA Ingenuity helicopter project on Mars was caused by a glitch that occurred when the helicopter tried switching from preflight mode to flight mode, leading to a 'watchdog' timer expiration [113179].
Non-software Causes 1. The failure incident was caused by a 'watchdog' timer expiration during a high-speed spin test of the rotors, which ended the command sequence prematurely [113179]. 2. The glitch occurred when the helicopter tried switching from preflight mode to flight mode, indicating a hardware or operational issue [113179].
Impacts 1. The software failure incident caused the delay of NASA's Ingenuity helicopter project's first powered flight on Mars, leading to rescheduling the launch multiple times [113179]. 2. The glitch in the software resulted in the need for a flight software update, which required time for development, validation, and uplinking, further postponing the flight [113179]. 3. The software trouble forced NASA to develop, test, and upload new software onto flight controllers, as well as reboot Ingenuity to address the issue and move forward with the mission [113179].
Preventions 1. Thorough testing of the software update before attempting the high-speed spin test of the rotors could have potentially prevented the software failure incident [113179]. 2. Implementing additional safeguards or redundancy in the software to prevent premature termination of operations due to watchdog timer expiration could have helped avoid the glitch during the rotor spin test [113179]. 3. Conducting more extensive simulations or scenario testing to identify and address potential issues during the transition from preflight mode to flight mode could have preemptively resolved the software update necessity [113179].
Fixes 1. Developing, testing, and uploading new software onto flight controllers [113179]
References 1. NASA's official statements [113179]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident having happened again at one_organization: The software failure incident with NASA's Ingenuity helicopter project on Mars involved a glitch in the rotors during a high-speed spin test, leading to a command sequence ending early due to a 'watchdog' timer expiration. This issue required a software update to address, and NASA had to develop, test, and upload new software onto flight controllers to resolve the problem [113179]. (b) The software failure incident having happened again at multiple_organization: There is no information in the provided article about a similar software failure incident happening at other organizations or with their products and services.
Phase (Design/Operation) design (a) The software failure incident in the NASA Ingenuity helicopter project on Mars was related to the design phase. The incident occurred during a high-speed spin test of the rotors when the command sequence controlling the test ended early due to a 'watchdog' timer expiration, which was designed to stop the operation if it detects issues. NASA mentioned that a software update is necessary to address this issue, and the agency had to develop, test, and upload new software onto flight controllers to move forward with the mission [113179]. (b) The software failure incident was not related to the operation phase or misuse of the system but rather to a technical issue during the design and testing phase of the helicopter's flight control software.
Boundary (Internal/External) within_system (a) within_system: The software failure incident related to NASA's Ingenuity helicopter project on Mars was within the system. The incident occurred during a high-speed spin test of the rotors when the command sequence controlling the test ended early due to a 'watchdog' timer expiration. This glitch happened when the helicopter tried switching from preflight mode to flight mode, necessitating a software update to address the issue. NASA had to develop, test, and upload new software onto flight controllers to resolve the software trouble [113179].
Nature (Human/Non-human) non-human_actions (a) The software failure incident in the NASA's Ingenuity helicopter project on Mars was due to a non-human action. The incident occurred during a high-speed spin test of the rotors when the command sequence controlling the test ended early due to a 'watchdog' timer expiration, which was designed to stop the operation if it detects issues. This led to a glitch when the helicopter tried switching from preflight mode to flight mode, necessitating a software update to address the issue [113179].
Dimension (Hardware/Software) software (a) The software failure incident occurring due to hardware: - The software failure incident with NASA's Ingenuity helicopter project on Mars was not directly attributed to hardware issues. The incident was primarily related to a software glitch that occurred during a high-speed spin test of the rotors, leading to a 'watchdog' timer expiration and the need for a software update to address the issue [113179]. (b) The software failure incident occurring due to software: - The software failure incident with NASA's Ingenuity helicopter project on Mars was primarily due to a software issue. The glitch occurred when the helicopter tried switching from preflight mode to flight mode, leading to the early termination of the command sequence controlling the test due to a 'watchdog' timer expiration. NASA stated that a software update was necessary to resolve the issue, requiring the development, testing, and uploading of new software onto flight controllers [113179].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident related to the NASA's Ingenuity helicopter project on Mars was non-malicious. The failure was due to a technical issue with the aircraft's rotors during a high-speed spin test, which led to a 'watchdog' timer expiration causing the command sequence controlling the test to end early [113179]. NASA mentioned that a software update was necessary to address the issue, and the agency had to develop, test, and upload new software onto flight controllers to move forward with the mission [113179].
Intent (Poor/Accidental Decisions) accidental_decisions The software failure incident related to NASA's Ingenuity helicopter project on Mars was not due to poor decisions but rather accidental decisions. The incident occurred during a high-speed spin test of the rotors when the command sequence controlling the test ended early due to a 'watchdog' timer expiration, which was designed to stop the operation if it detects issues [113179]. NASA mentioned that a software update was necessary to address the issue, and the agency had to develop, test, and upload new software onto flight controllers to move forward with the mission [113179]. This indicates that the failure was more of an accidental decision or mistake rather than a result of poor decisions.
Capability (Incompetence/Accidental) accidental (a) The software failure incident in the NASA Ingenuity helicopter project on Mars was not due to development incompetence. The issue with the helicopter's rotors and the subsequent software update needed were part of the planned testing and development process. NASA identified the problem during a high-speed spin test of the rotors, where the command sequence controlling the test ended early due to a 'watchdog' timer expiration [113179]. The software update was deemed necessary to address the issue, and NASA had to develop, test, and upload new software onto flight controllers to move forward with the mission [113179]. (b) The software failure incident in the NASA Ingenuity helicopter project on Mars was accidental. The glitch occurred when the helicopter tried switching from preflight mode to flight mode, leading to the early termination of the command sequence controlling the test due to a 'watchdog' timer expiration [113179]. This accidental issue required a software update to rectify the problem and proceed with the planned test flights on Mars.
Duration temporary The software failure incident related to the NASA's Ingenuity helicopter project on Mars was temporary. The incident occurred during a high-speed spin test of the rotors when the command sequence controlling the test ended early due to a 'watchdog' timer expiration, which was designed to stop the operation if it detects issues. NASA mentioned that a software update was necessary to address the issue, and they had to develop, test, and upload new software onto flight controllers to resolve the problem. Once the software update was completed, the helicopter could move forward with its mission, indicating that the software failure was temporary and could be rectified with a software update [113179].
Behaviour crash, other (a) crash: The software failure incident in the NASA Ingenuity helicopter project on Mars can be categorized as a crash. The incident occurred during a high-speed spin test of the rotors when the command sequence controlling the test ended early due to a 'watchdog' timer expiration, causing the operation to stop abruptly [113179]. (b) omission: The software failure incident did not involve omission as the system was actively engaged in a test (rotor spin test) when the failure occurred, rather than omitting to perform its intended functions at an instance. (c) timing: The software failure incident was not related to timing issues where the system performed its intended functions but at the wrong time. (d) value: The software failure incident did not involve the system performing its intended functions incorrectly, but rather the system losing state and not performing any of its intended functions. (e) byzantine: The software failure incident did not exhibit byzantine behavior with inconsistent responses and interactions. (f) other: The behavior of the software failure incident can be described as a system losing state and not performing any of its intended functions due to a 'watchdog' timer expiration during the rotor spin test, leading to the crash of the operation [113179].

IoT System Layer

Layer Option Rationale
Perception actuator, processing_unit, embedded_software (a) sensor: The software failure incident related to the sensor error is not explicitly mentioned in the provided article. Therefore, it is unknown. (b) actuator: The article mentions a software update being necessary to address an issue that occurred when the helicopter tried switching from preflight mode to flight mode. This issue led to a software trouble that postponed the planned flight, indicating a failure related to the actuator [113179]. (c) processing_unit: The article discusses a software update being required to address the issue that occurred during a high-speed spin test of the rotors, where the command sequence controlling the test ended early due to a 'watchdog' timer expiration. This points to a failure related to the processing unit [113179]. (d) network_communication: The software failure incident related to network communication error is not mentioned in the provided article. Therefore, it is unknown. (e) embedded_software: The article specifically mentions that a software update is necessary to address the issue that occurred during the high-speed spin test of the rotors, indicating a failure related to the embedded software [113179].
Communication unknown The software failure incident reported in Article 113179 was not related to the communication layer of the cyber-physical system. The failure was specifically related to the helicopter's flight control software, which needed an update to address an issue that occurred during a high-speed spin test of the rotors. NASA mentioned that the software update was necessary to address the problem that occurred when the helicopter tried switching from preflight mode to flight mode. The agency had to develop, test, and upload new software onto flight controllers to resolve the issue ([113179]).
Application FALSE The software failure incident related to NASA's Ingenuity helicopter project on Mars was not directly related to the application layer of the cyber physical system. The failure was specifically mentioned to be related to the helicopter's flight control software, which falls more under the category of system software rather than application software. Therefore, the failure was not due to bugs, operating system errors, unhandled exceptions, or incorrect usage typically associated with the application layer of a system [113179].

Other Details

Category Option Rationale
Consequence delay, non-human The consequence of the software failure incident in the NASA Ingenuity helicopter project on Mars was a delay in the planned flight. The software issue caused the launch to be postponed multiple times as NASA needed to modify and reinstall the helicopter's flight control software [113179].
Domain knowledge (a) The failed system was related to the industry of space exploration, specifically NASA's Ingenuity Mars Helicopter project [113179].

Sources

Back to List