| Recurring |
one_organization, multiple_organization |
(a) The software failure incident happened again at the same organization, NASA, with the Curiosity rover on Mars. The incident involved memory problems with one of the craft's two identical computers, leading to a delay in science operations. Engineers had to switch to the second computer while working on ways to avoid a recurrence of the problem. However, just before finishing the troubleshooting work, the second computer suffered a software glitch and put itself into standby mode [17853].
(b) The software failure incident involving memory corruption and the need to switch operations to a backup flight computer has occurred at other organizations as well. In the case of the Curiosity Mars rover, engineers discovered data corruption in the solid-state memory used by the rover's active flight computer, leading to the need for a complex procedure to switch operations to the backup computer. This incident highlights the challenges and risks associated with memory glitches and the importance of redundancy in critical systems [17558]. |
| Phase (Design/Operation) |
design, operation |
(a) The software failure incident related to the design phase can be seen in the articles. The incident with the Curiosity rover on Mars was due to memory problems that cropped up with one of the craft's two identical computers during science observations [17853]. Engineers had to switch to the second computer while working on ways to avoid a recurrence of the problem. However, just a day away from finishing the troubleshooting work, the second computer suffered a software glitch and put itself into standby mode, causing further delays [17853]. This indicates that the initial design or development of the system may have had vulnerabilities that led to these software glitches.
(b) The software failure incident related to the operation phase is evident in the articles as well. The memory glitch that interrupted science operations on the Curiosity rover was discovered during its active operation when it failed to send back science data as expected and did not put itself to sleep during scheduled downtime [17558]. This forced flight controllers to put the craft in a low-activity "safe mode" while the computer switch was implemented [17558]. The incident occurred during the operational phase of the rover's mission, highlighting issues that arose during the operation or use of the system. |
| Boundary (Internal/External) |
within_system |
(a) The software failure incident related to the Curiosity rover on Mars was primarily within the system. The incident involved memory problems with one of the craft's two identical computers, leading to a delay in science operations [Article 17853]. Engineers had to switch to the second computer while troubleshooting the issue. Additionally, a software glitch occurred with the second computer, putting it into standby mode, which was resolved internally by the engineers [Article 17853]. The memory corruption discovered in the rover's active computer was also an internal issue that interrupted science operations and required a complex sequence of steps to switch operations to a backup flight computer [Article 17558]. The engineers were focused on resolving the memory glitch within the system by conducting a thorough analysis and taking steps to ensure the proper functioning of the backup computer [Article 17558]. |
| Nature (Human/Non-human) |
non-human_actions, human_actions |
(a) The software failure incident related to non-human actions:
- The software glitch that put the Curiosity rover into standby mode was a non-human action, occurring just a day before finishing the troubleshooting work [Article 17853].
- The memory corruption discovered in the rover's active computer was suspected to be caused by space radiation, specifically a "single-event upset" where an energetic particle changed the state of memory addresses [Article 17558].
(b) The software failure incident related to human actions:
- Engineers were working on ways to avoid a recurrence of the memory problems in the Curiosity rover's computers [Article 17853].
- Engineers were conducting a complex sequence of steps to switch operations to a backup flight computer on the rover [Article 17558]. |
| Dimension (Hardware/Software) |
hardware, software |
(a) The software failure incident related to hardware:
- The memory glitch on the Curiosity Mars rover was suspected to have been caused by space radiation, specifically a "single-event upset" where an energetic particle changed the state of memory addresses [Article 17558].
- Engineers planned to power up the A-side computer without loading software to check the status of the nonvolatile memory, indicating a hardware-related investigation into the memory corruption issue [Article 17558].
(b) The software failure incident related to software:
- The Curiosity rover experienced memory problems with one of its computers, leading to a delay in science operations. Engineers switched to the second computer while working on ways to avoid a recurrence of the problem [Article 17853].
- The second computer suffered a software glitch and put itself into standby mode, which engineers were able to resolve by commanding it back into regular operations [Article 17853].
- The memory corruption discovered in the rover's active computer interrupted science operations, leading to the implementation of a computer switch to the backup flight computer [Article 17558].
- Engineers were cautious about rebooting the A-side computer and loading software to avoid potentially destroying evidence that could help identify the root cause of the memory corruption issue, indicating a software-related approach to handling the incident [Article 17558]. |
| Objective (Malicious/Non-malicious) |
non-malicious |
(a) The software failure incident related to the Curiosity rover on Mars was non-malicious. The incident was caused by memory problems and a software glitch that put one of the craft's computers into standby mode, leading to a delay in science operations [17853, 17558]. Engineers worked on troubleshooting the issue and implementing a complex procedure to switch operations to a backup flight computer to resolve the memory corruption discovered in the active computer [17558]. The incident was attributed to a memory glitch possibly caused by space radiation, specifically a "single-event upset" affecting the memory addresses [17558].
(b) There is no indication in the articles that the software failure incident was malicious. |
| Intent (Poor/Accidental Decisions) |
accidental_decisions |
(a) The software failure incident related to the Curiosity rover on Mars was not due to poor decisions but rather to accidental factors. The incident was caused by a memory glitch and a software glitch that interrupted science operations on the rover. Engineers had to switch operations to a backup flight computer to resolve the memory corruption issue discovered in the active computer [Article 17853, Article 17558]. The decision to switch to the backup computer was a deliberate and necessary step taken to address the technical problem, rather than a result of poor decisions. |
| Capability (Incompetence/Accidental) |
development_incompetence |
(a) The software failure incident related to development incompetence is evident in the articles. Engineers working on the Curiosity Mars rover encountered memory problems with one of the craft's computers, leading to a delay in science operations [Article 17853]. The incident involved a complex sequence of steps to switch operations to a backup flight computer, indicating the challenges faced due to the memory corruption discovered in the rover's active computer [Article 17558]. These issues highlight the impact of human error or lack of professional competence in the development or maintenance of the software systems onboard the rover. |
| Duration |
temporary |
From the provided articles [17853, 17558], the software failure incident related to the Curiosity rover on Mars can be categorized as a temporary failure. The incident involved memory problems and a software glitch that caused interruptions in science operations. Engineers were able to resolve the issues by switching to the backup computer system and implementing solutions to overcome the glitches. The duration of the software failure incident was temporary as the team was able to address the problems and resume limited science operations shortly after the incidents occurred. |
| Behaviour |
crash, omission, other |
(a) crash: The software failure incident related to the Curiosity rover on Mars involved a crash when the second computer suffered a software glitch and put itself into standby mode, causing a delay in science operations [17853].
(b) omission: The memory glitch in the active computer of the Curiosity rover led to the omission of sending back science data as expected and not putting itself to sleep during scheduled downtime, resulting in interrupted science operations [17558].
(c) timing: The software failure incident did not involve a timing issue as the system was not reported to perform its intended functions either too late or too early.
(d) value: The software failure incident did not involve the system performing its intended functions incorrectly.
(e) byzantine: The software failure incident did not exhibit a byzantine behavior with inconsistent responses and interactions.
(f) other: The other behavior observed in the software failure incident was the need for a complex sequence of steps to switch operations to a backup flight computer, indicating a planned and systematic approach to address the memory corruption issue [17558]. |