Incident: American Airlines Fleet Grounded Due to Computer Malfunction.

Published Date: 2015-09-18

Postmortem Analysis
Timeline 1. The software failure incident involving American Airlines occurred on Thursday night [51371]. 2. Published on 2015-09-18 07:00:00+00:00. 3. The incident likely occurred on Thursday, September 17, 2015.
System 1. American Airlines' computer system 2. iPad app used by pilots [51371]
Responsible Organization 1. The computer malfunction that led to the widespread outage at American Airlines was responsible for causing the software failure incident [51371].
Impacted Organization 1. American Airlines [51371] 2. Passengers of American Airlines flights 3. Federal Aviation Authority
Software Causes 1. Connectivity issues leading to a ground stop at Chicago, Dallas/Fort Worth, and Miami hubs [51371] 2. Network connectivity issue at United Airlines in July [51371] 3. Automation issues at United Airlines in June [51371] 4. iPad app crash used by pilots at American Airlines in April [51371]
Non-software Causes 1. Connectivity issues leading to a ground stop at Chicago, Dallas/Fort Worth, and Miami hubs [51371] 2. Network connectivity issue at United Airlines in July [51371] 3. Automation issues at United Airlines in June [51371]
Impacts 1. American Airlines had to ground large parts of its fleet, leading to flight halts in Chicago O’Hare, Dallas-Fort Worth, and Miami [51371]. 2. Passengers faced delays and inconvenience due to the connectivity issues caused by the software failure incident [51371]. 3. The airline industry faced a series of flight delays and cancellations due to computer errors, impacting multiple airlines in the US [51371].
Preventions 1. Implementing thorough testing procedures for software updates and changes to ensure they do not introduce new bugs or issues [51371]. 2. Conducting regular maintenance and monitoring of critical systems to identify and address potential issues before they escalate into major outages [51371]. 3. Enhancing redundancy and failover mechanisms in the airline's computer systems to mitigate the impact of a single point of failure [51371].
Fixes 1. Implementing more robust testing procedures to catch potential software bugs before they impact operations [51371]. 2. Conducting a thorough root cause analysis to identify the specific issue that led to the computer malfunction and taking steps to address it effectively [51371]. 3. Enhancing redundancy and failover mechanisms in the airline's systems to minimize the impact of future software failures [51371].
References 1. American Airlines spokesman as reported by Time Magazine [51371] 2. Federal Aviation Authority's advisory [51371]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident having happened again at one_organization: - American Airlines experienced a software failure incident in April where multiple flights were delayed due to the iPad app used by pilots crashing [51371]. (b) The software failure incident having happened again at multiple_organization: - United Airlines faced a similar issue in July when thousands of flights were grounded across the US due to a "network connectivity issue" [51371]. - In June, United Airlines also had to ground flights due to "automation issues" [51371].
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase can be seen in the article where American Airlines experienced a widespread outage due to a computer malfunction. The malfunction led to a ground stop at multiple hubs, affecting flights from Chicago O’Hare, Dallas-Fort Worth, and Miami. The article mentions that the issue began around midday local time and continued until mid-afternoon, impacting the airline's operations [51371]. (b) The software failure incident related to the operation phase is evident in the same article where American Airlines had to ground large parts of its fleet due to a computer malfunction. The grounding was a result of connectivity issues that led to a ground stop at the Chicago, Dallas/Fort Worth, and Miami hubs. The airline spokesperson mentioned that they were working to resolve the connectivity issues and get customers on their way as soon as possible, indicating operational challenges caused by the software failure [51371].
Boundary (Internal/External) within_system (a) The software failure incident reported in Article 51371 falls under the within_system category. The incident was caused by a computer malfunction within American Airlines' system, leading to a widespread outage that forced the grounding of flights from multiple hubs. The issue was related to connectivity problems and system errors, as mentioned in the article. The failure originated from within the airline's own system, specifically affecting their operations and causing disruptions to flight schedules [51371].
Nature (Human/Non-human) non-human_actions (a) The software failure incident occurring due to non-human actions: - The article mentions that American Airlines was forced to ground large parts of its fleet after a computer malfunction led to a widespread outage. The issue began around midday local time and continued until mid-afternoon. The airline did not provide specific details on what caused the malfunction, and the Federal Aviation Authority's advisory only detailed it as "airline issues" [51371]. (b) The software failure incident occurring due to human actions: - The article does not provide specific information indicating that the software failure incident was directly caused by human actions.
Dimension (Hardware/Software) hardware, software (a) The software failure incident related to hardware: - The article mentions that American Airlines delayed multiple flights in April after the iPad app used by pilots crashed. The cockpit iPads, which are part of the hardware, are used as an "electronic flight bag" but failed across the airline on 29 April [51371]. (b) The software failure incident related to software: - The main incident reported in the article is about American Airlines grounding large parts of its fleet due to a computer malfunction, leading to a widespread outage. The issue was related to a computer malfunction, indicating a software failure [51371].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident reported in Article 51371 does not indicate any malicious intent behind the failure. It was described as a computer malfunction that led to a widespread outage affecting American Airlines' operations. The article mentions that the issue began around midday local time and continued until mid-afternoon, causing flights to be halted in several major hubs. The Federal Aviation Authority's advisory only detailed it as "airline issues" without suggesting any malicious activity [51371]. (b) The software failure incident in the article falls under the category of non-malicious failure. It was attributed to a computer malfunction that caused connectivity issues, leading to a ground stop at multiple American Airlines hubs. The airline spokesperson mentioned that they were working to resolve the issue and get customers on their way as soon as possible, indicating that the failure was not intentional but rather a technical issue [51371].
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident involving American Airlines' grounding of its fleet due to a computer malfunction could be attributed to poor decisions. The incident was part of a series of US airlines experiencing delays or cancellations due to computer errors. In April, American Airlines faced delays when the iPad app used by pilots crashed, affecting multiple flights. The introduction of the cockpit iPads as an "electronic flight bag" was meant to replace paper manuals, but the failure of the app across the airline on 29 April caused disruptions [51371].
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident related to development incompetence is evident in the article as American Airlines experienced multiple issues with their software systems. In April, the airline faced delays when the iPad app used by pilots crashed, causing disruptions to flight schedules. This incident highlighted a failure in the software system that was introduced in 2013 to replace paper manuals with electronic flight bags. The malfunction of the iPad app, which is a critical tool for pilots, showcases a potential lack of professional competence in the development or maintenance of the software system [51371]. (b) The software failure incidents occurring accidentally are also apparent in the articles. For instance, in July, United Airlines had to ground thousands of flights due to a "network connectivity issue," and in June, they faced flight disruptions due to "automation issues." These incidents suggest that the software failures were not intentional but rather accidental occurrences that impacted the airlines' operations [51371].
Duration temporary (a) The software failure incident reported in Article 51371 was temporary. The American Airlines computer malfunction led to a widespread outage that began around midday local time and continued until mid-afternoon on the same day. The issue was resolved, and the airline was working to get customers on their way as soon as possible [51371].
Behaviour crash (a) crash: The software failure incident described in Article 51371 involved a crash. American Airlines was forced to ground large parts of its fleet due to a computer malfunction that led to a widespread outage. Flights from multiple hubs were halted, and the issue lasted for several hours until it was resolved [51371]. (b) omission: There is no specific mention of the software failure incident being related to omission in the provided article. (c) timing: The software failure incident did not involve timing issues. The system did not perform its intended functions due to the crash, but there was no mention of the functions being performed too late or too early. (d) value: The software failure incident did not involve the system performing its intended functions incorrectly. It was primarily a crash that led to a widespread outage. (e) byzantine: The software failure incident did not exhibit behaviors of inconsistency or erratic responses typically associated with a byzantine failure. (f) other: The behavior of the software failure incident in this case was primarily a crash that resulted in the system losing its state and not performing its intended functions, leading to a widespread grounding of flights by American Airlines [51371].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence delay The consequence of the software failure incident reported in Article 51371 was primarily related to delays in flight operations. American Airlines had to ground large parts of its fleet due to a computer malfunction, leading to a widespread outage. Flights from multiple hubs were halted, causing inconvenience to passengers and disrupting travel plans [51371]. The incident did not result in any direct harm, death, or impact on basic necessities like food or shelter. It primarily affected the airline's operations and caused delays for passengers.
Domain transportation, utilities, other (a) The failed system in the incident was related to the transportation industry as it affected American Airlines' fleet operations, leading to flight delays and cancellations [51371]. (g) The incident also impacted the utilities industry indirectly as it disrupted the provision of services related to air travel, which is a form of transportation utility [51371]. (m) The incident could also be categorized under the "other" industry as it involved the use of technology (specifically iPad apps for pilots) in the aviation sector, which is not explicitly covered in the provided industry options [51371].

Sources

Back to List