Incident: British Airways System Failure Causes Flight Cancellations and Delays

Published Date: 2019-08-07

Postmortem Analysis
Timeline 1. The software failure incident involving British Airways occurred on August 7, 2019, as reported in Article 88161.
System 1. British Airways' computer systems [Article 88161]
Responsible Organization 1. British Airways [88161]
Impacted Organization 1. Passengers of British Airways [Article 88161]
Software Causes 1. The software cause of the failure incident at British Airways was a systems failure that led to the cancellation and delay of more than 500 flights [Article 88161].
Non-software Causes 1. Threatened industrial action by Heathrow staff leading to flight cancellations and reinstatements [Article 88161] 2. Separate strike action by BA pilots due to rejected pay deal [Article 88161] 3. Strikes announced by Ryanair pilots due to issues with dealing with unions [Article 88161] 4. Evacuation of a BA flight due to smoke in the cabin [Article 88161] 5. Previous severe computer system failures in 2017 [Article 88161]
Impacts 1. More than 500 flights were cancelled or delayed as a result of the systems failure, affecting the travel plans of tens of thousands of passengers [Article 88161]. 2. The disruption primarily affected London's Heathrow, Gatwick, and City airports, with knock-on effects in other European destinations when incoming flights failed to arrive [Article 88161]. 3. Customers experienced difficulties checking in online, leading to queues forming in airport departure areas as the airline had to resort to manual systems to keep flights operating [Article 88161]. 4. On flights that did take off, the food and drink service was affected [Article 88161]. 5. BA may be liable to pay compensation to affected passengers under EU law for cancellations or delays of two hours or more [Article 88161]. 6. About half of the BA flights scheduled to depart from Heathrow's Terminal 5 between 9.30am and midday were cancelled or delayed [Article 88161]. 7. The disruption caused chaos and frustration for customers caught up in the incident [Article 88161].
Preventions 1. Implementing robust backup systems and redundancy measures to ensure continuity of operations in case of a system failure [88161]. 2. Conducting regular system maintenance and testing to identify and address potential issues before they escalate into major failures [88161]. 3. Enhancing cybersecurity measures to prevent potential cyber attacks or breaches that could lead to system failures [88161]. 4. Improving communication and coordination between different departments within the airline to streamline response efforts during a crisis situation [88161].
Fixes 1. Implementing more robust and redundant systems to prevent future computer failures [88161] 2. Conducting thorough testing and quality assurance procedures on software updates and changes to avoid unexpected issues [88161] 3. Improving communication and customer support strategies to keep passengers informed and minimize frustration during disruptions [88161]
References 1. Flightstats.com [88161] 2. London’s Heathrow, Gatwick, and City airports [88161] 3. BA's website [88161]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident has happened again at British Airways. The article mentions that British Airways has faced a series of operational problems, including severe computer system failures in the past. In 2017, over a May bank holiday weekend, 75,000 passengers were stranded when the airline was forced to cancel more than 700 flights over three days. This indicates a recurrence of software failure incidents within the same organization [88161]. (b) The software failure incident has also happened at other organizations or with their products and services. The article mentions that Ryanair pilots who are members of the Balpa union announced five days of walkouts due to issues with Ryanair's operations. Additionally, Heathrow ground staff were planning a strike unless talks resulted in a compromise, indicating operational issues at multiple organizations [88161].
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase can be seen in the article where British Airways faced a systems failure that resulted in the cancellation or delay of more than 500 flights. The failure was attributed to a computer failure that affected the London airports, leading to operational disruptions and difficulties for customers checking in online. The airline had to resort to manual systems to keep flights operating, indicating a failure in the designed system [88161]. (b) The software failure incident related to the operation phase is evident in the same article where customers experienced difficulties checking in online, queues formed in airport departure areas, and some social media users reported that the food and drink service on flights was affected. These issues arose due to the operation or misuse of the system as the airline had to implement manual systems to manage the disruptions caused by the computer failure [88161].
Boundary (Internal/External) within_system (a) The software failure incident at British Airways was primarily within the system. The article mentions that the computer failure affected only the London airports, leading to the cancellation and delay of more than 500 flights [88161]. Additionally, the disruption caused by the failure had knock-on effects at other airports and European destinations when incoming flights failed to arrive. The airline had to resort to manual systems to keep flights operating, and customers experienced difficulties checking in online [88161].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident with British Airways was primarily due to non-human actions. The article mentions that the systems failure resulted in the cancellation and delay of more than 500 flights, affecting passengers at London airports and causing disruptions in various European destinations. The computer failure led to difficulties in online check-ins, queues at departure areas, and even impacted the food and drink service on some flights. The airline had to resort to manual systems to keep flights operating until the issue was resolved [88161]. (b) Human actions also played a role in the software failure incident. The article highlights the separate strike actions faced by British Airways, including threatened industrial action by Heathrow staff and upcoming strikes by the airline's pilots. The strikes and industrial actions by human employees could have contributed to the operational problems faced by the airline, adding to the challenges caused by the software failure [88161].
Dimension (Hardware/Software) software (a) The software failure incident reported in the article was primarily due to contributing factors originating in software. The article mentions that British Airways faced a systems failure resulting in the cancellation and delay of more than 500 flights. The disruption was caused by a computer failure affecting London airports and leading to difficulties in online check-ins, queues at departure areas, and disruptions in food and drink services on flights [88161]. (b) The software failure incident was not explicitly attributed to hardware issues in the articles.
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident at British Airways was non-malicious. The incident was described as a systems failure affecting multiple London airports, leading to the cancellation and delay of over 500 flights. The disruption caused inconvenience to tens of thousands of passengers, with difficulties in online check-ins and manual systems being implemented to keep flights operating [88161].
Intent (Poor/Accidental Decisions) unknown The software failure incident at British Airways, where more than 500 flights were cancelled or delayed, was not explicitly attributed to poor decisions or accidental decisions in the articles provided [88161]. The incident was primarily described as a systems failure that affected the airline's operations, leading to disruptions at various airports. The specific cause of the failure, whether it was due to poor decisions or accidental factors, was not detailed in the articles.
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident related to the British Airways flight cancellations and delays in 2019 could be attributed to development incompetence. The incident was caused by a systems failure that led to the cancellation and delay of more than 500 flights [88161]. The disruption affected multiple airports, including London's Heathrow, Gatwick, and City, as well as other European destinations. Customers faced difficulties with online check-ins, and manual systems had to be implemented to keep flights operating. The incident also impacted services like food and drink on the flights that did take off. This indicates a failure in the systems developed and maintained by British Airways, possibly due to issues related to professional competence in managing and maintaining their systems. (b) The software failure incident could also be considered accidental. While the specific cause of the systems failure was not detailed in the article, the incident itself was described as a computer failure that affected the airline's operations [88161]. The disruption was not intentional but rather an unexpected event that led to significant flight cancellations and delays. The incident was not attributed to any malicious activity but rather to an accidental failure in the systems, which could have been caused by various factors such as technical glitches, hardware failures, or software bugs.
Duration temporary The software failure incident experienced by British Airways was temporary. The article mentions that the computer failure affected the airline's operations, leading to the cancellation and delay of more than 500 flights [88161]. The disruption caused by the systems failure was eventually resolved by 4 pm on the same day, indicating that the failure was temporary and not permanent.
Behaviour crash, omission, timing, value, other (a) crash: The software failure incident involving British Airways resulted in the cancellation or delay of more than 500 flights due to a systems failure. The system lost its state and was not performing its intended functions, leading to operational disruptions at London airports and beyond [Article 88161]. (b) omission: The software failure incident led to difficulties for customers checking in online and queues forming in airport departure areas as the airline had to resort to manual systems to keep flights operating. This omission to perform the intended functions caused inconvenience to passengers [Article 88161]. (c) timing: The software failure incident caused delays in flights, with about half of the BA flights scheduled to depart from Heathrow’s Terminal 5 between 9.30am and midday being cancelled or delayed. The system was performing its intended functions but at the wrong time, resulting in operational disruptions [Article 88161]. (d) value: The software failure incident affected the food and drink service on flights that did take off, indicating that the system was performing its intended functions incorrectly in this aspect [Article 88161]. (e) byzantine: There is no specific mention of the software failure incident exhibiting Byzantine behavior in the articles. (f) other: The software failure incident also had knock-on effects at other airports, including Edinburgh, Glasgow, and Belfast, as incoming flights failed to arrive. This unexpected behavior of the system caused disruptions beyond the London airports initially affected [Article 88161].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence basic, delay, theoretical_consequence (a) death: People lost their lives due to the software failure - There is no mention of any deaths resulting from the software failure incident reported in the articles. [88161] (b) harm: People were physically harmed due to the software failure - There is no mention of physical harm to individuals due to the software failure incident. [88161] (c) basic: People's access to food or shelter was impacted because of the software failure - The article mentions that on flights that did take off during the software failure incident, the food and drink service was affected, indicating a basic impact on services provided to passengers. [88161] (d) property: People's material goods, money, or data was impacted due to the software failure - The article does not specifically mention any impact on people's material goods, money, or data due to the software failure incident. [88161] (e) delay: People had to postpone an activity due to the software failure - The software failure incident resulted in the cancellation or delay of more than 500 British Airways flights, affecting the travel plans of tens of thousands of passengers. Passengers experienced difficulties checking in online, and queues formed in airport departure areas as manual systems were implemented to keep flights operating. [88161] (f) non-human: Non-human entities were impacted due to the software failure - The software failure incident primarily affected British Airways' operations, resulting in flight cancellations and delays. There is no specific mention of non-human entities being impacted. [88161] (g) no_consequence: There were no real observed consequences of the software failure - The software failure incident led to the cancellation and delay of hundreds of flights, impacting passengers' travel plans and causing disruptions at various airports. Therefore, there were observed consequences of the software failure. [88161] (h) theoretical_consequence: There were potential consequences discussed of the software failure that did not occur - The article mentions that air travel experts suggested that British Airways may be liable to pay compensation to affected passengers under EU law for flight cancellations or delays. This indicates a potential consequence that could result from the software failure incident. [88161] (i) other: Was there consequence(s) of the software failure not described in the (a to h) options? What is the other consequence(s)? - There is no other consequence mentioned in the articles beyond those described in options (c) and (h). [88161]
Domain information (a) The failed system in the software failure incident reported by Article 88161 was intended to support the production and distribution of information. The incident affected British Airways' systems, leading to the cancellation and delay of more than 500 flights, causing passenger anger and disruption at various airports [88161]. The system failure impacted online check-ins, flight departures, and arrivals, highlighting the reliance on information systems for airline operations.

Sources

Back to List