| Recurring |
one_organization |
(a) The software failure incident having happened again at one_organization:
The article mentions that the airline industry, including British Airways, is known for running outdated infrastructure long after standards have improved. It was revealed that passenger booking systems used by multiple airlines were vulnerable to hackers [59063].
(b) The software failure incident having happened again at multiple_organization:
The article does not provide specific information about similar incidents happening at other organizations. Therefore, it is unknown if similar incidents have occurred at multiple organizations. |
| Phase (Design/Operation) |
design, operation |
(a) The article mentions experts questioning British Airways' claim that the catastrophic IT failure was solely due to a "power surge." Data center designers highlighted that a power surge should not have been able to bring down a data center and its backup system, suggesting potential issues with the design of the system or the presence of additional factors beyond just a power surge [Article 59063].
(b) The article discusses the importance of testing procedures like rebooting crucial databases and restoring servers after a power outage. Matthew Bloch, the managing director of Bytemark Hosting, raised concerns about the procedures followed when the power was turned back on, indicating potential operational factors contributing to the software failure incident [Article 59063]. |
| Boundary (Internal/External) |
within_system, outside_system |
(a) within_system: The software failure incident reported in the articles seems to have been influenced by factors originating from within the system itself. The article mentions concerns raised by experts regarding British Airways' claim that the catastrophic IT failure was caused by a "power surge" [59063]. Data center designers highlighted issues with the design and resilience of the data center infrastructure, such as the lack of surge protection, uninterruptible power supply, and quality earthing system, which should have protected against power surges [59063]. Additionally, the article discusses the importance of testing procedures like rebooting crucial databases and restoring servers, indicating potential internal system vulnerabilities that could have contributed to the failure [59063].
(b) outside_system: On the other hand, the articles also suggest that factors external to the system may have played a role in the software failure incident. The electricity companies providing energy to the area where BA's data center is located denied that there had been a power surge, questioning the validity of BA's explanation for the failure [59063]. This external factor raises doubts about the accuracy of BA's initial claim regarding the cause of the IT failure, indicating a potential discrepancy between the internal system assessment and external verification. |
| Nature (Human/Non-human) |
non-human_actions |
(a) The software failure incident occurring due to non-human actions:
The article mentions that British Airways' catastrophic IT failure was initially attributed to a "power surge" by the company's chief executive. However, experts questioned this explanation, stating that a power surge should not be able to bring down a data centre and its backup systems. Data centre designers highlighted the importance of surge protection, uninterruptible power supply (UPS), and a quality earthing system to protect against power surges. Additionally, it was noted that the real problem might have occurred when the power was turned back on after the outage, raising questions about the testing and procedures related to crucial databases and servers [Article 59063].
(b) The software failure incident occurring due to human actions:
The article does not provide direct evidence or mention of human actions contributing to the software failure incident. Therefore, it is unknown if human actions played a role in the British Airways IT failure incident reported in the article. |
| Dimension (Hardware/Software) |
hardware |
(a) The software failure incident in the British Airways case was initially attributed to a "power surge" by the company's chief executive [Article 59063]. However, experts questioned this explanation, stating that a power surge should not be able to bring down a data centre and its backup systems. They highlighted the importance of surge protection, uninterruptible power supply (UPS), and a quality earthing system to protect against power surges, indicating a hardware-related issue in the infrastructure design [Article 59063].
(b) The incident also raised concerns about the resilience of data centres and the infrastructure required to ensure continuous operation of IT equipment. Experts mentioned that failures in the infrastructure, such as power problems, are common causes of data centre outages, indicating a software-related issue in terms of the systems and procedures in place to handle power restoration and database reboots [Article 59063]. |
| Objective (Malicious/Non-malicious) |
non-malicious |
(a) The software failure incident reported in the articles does not indicate any malicious intent behind the failure. The incident is primarily attributed to issues related to power surges, lack of resilience in data centers, outdated infrastructure, and potential failures in testing and rebooting crucial systems [59063]. These factors point towards a non-malicious nature of the failure. |
| Intent (Poor/Accidental Decisions) |
poor_decisions |
[a] The software failure incident reported in the articles seems to be more aligned with poor_decisions. The incident was attributed to a claimed power surge by British Airways' chief executive, which experts questioned. Data center designers mentioned that proper surge protection measures, such as surge protection and uninterruptible power supply, should have prevented such an incident. Additionally, the lack of resilience in data centers to common problems like power surges was highlighted, indicating a potential lack of investment in infrastructure maintenance and upgrades [59063]. |
| Capability (Incompetence/Accidental) |
accidental |
(a) The articles do not specifically mention the software failure incident being attributed to development incompetence by humans or the development organization. The focus is more on the potential issues related to power surges, infrastructure resilience, and outdated systems.
(b) The articles highlight the possibility of the software failure incident being accidental. For example, the article mentions concerns raised by experts regarding the claim of a power surge causing the catastrophic IT failure at British Airways. There are discussions about the lack of resilience in data centers, the importance of proper testing procedures, and the challenges associated with outdated infrastructure. These factors suggest that the incident may have been accidental rather than intentionally caused by development incompetence [59063]. |
| Duration |
unknown |
The articles do not provide specific information about whether the software failure incident was permanent or temporary. |
| Behaviour |
crash |
(a) crash: The software failure incident in the British Airways case resulted in a catastrophic IT failure that brought down the data center and its backup system, rendering them ineffective [Article 59063].
(b) omission: The incident highlighted the lack of resilience in many data centers to deal with common problems, indicating an omission in ensuring infrastructure to prevent power outages [Article 59063].
(c) timing: The issue was not just the power surge itself but also the consequences of turning the power back on, raising questions about the timing of crucial database reboots and server restoration procedures [Article 59063].
(d) value: The failure was not directly attributed to the system performing its intended functions incorrectly but rather to the inability of the infrastructure to handle power surges effectively [Article 59063].
(e) byzantine: The incident did not exhibit characteristics of a byzantine failure where the system behaves erroneously with inconsistent responses and interactions. Instead, it was primarily a result of the power surge and infrastructure vulnerabilities [Article 59063].
(f) other: The incident also shed light on the issue of outdated infrastructure in the airline industry and other sectors, indicating a broader problem beyond just the immediate software failure incident [Article 59063]. |