Incident: Global Airline Disruption: Altea Software Failure Impacts Major Carriers

Published Date: 2017-09-28

Postmortem Analysis
Timeline 1. The software failure incident happened on September 28, 2017, as reported in Article 62872.
System 1. Altea software program developed by Amadeus [62872]
Responsible Organization 1. The software failure incident was caused by a "network issue" according to Amadeus, the company that developed the Altea software used by several major airlines [62872].
Impacted Organization 1. Travelers around the world were impacted by the software failure incident as they struggled to check in at airports [62872]. 2. Airlines such as British Airways, Lufthansa, Thai Airways, Air France, United, Singapore Airlines, Air Canada, and Swiss International Air Lines were affected by the software failure incident [62872].
Software Causes 1. The software causes of the failure incident were related to a network issue that affected the Altea software program developed by Amadeus, used by multiple major airlines for managing customer reservations, tagging luggage, and issuing boarding passes [62872].
Non-software Causes 1. Network issue [62872]
Impacts 1. Flights were delayed globally, and travelers faced difficulties checking in at airports due to the software failure incident [62872]. 2. The software failure impacted major airlines such as British Airways, Lufthansa, Thai Airways, and Air France, which use the affected software for managing customer reservations [62872]. 3. The Star Alliance, which includes airlines like United, Singapore Airlines, Air Canada, and Lufthansa, reported that two-thirds of its airlines were affected by the software failure, causing disruptions to their services [62872]. 4. Some airlines experienced flight delays, but there were no cancellations reported as a direct result of the software failure incident [62872]. 5. The software failure incident affected airports in Asia, Europe, and the Americas, showcasing the widespread impact of the issue on modern air travel operations [62872].
Preventions 1. Implementing robust network redundancy and failover mechanisms to mitigate the impact of network issues like the one experienced by Amadeus could have prevented the software failure incident [62872]. 2. Conducting thorough testing, including stress testing and scenario-based testing, to identify and address potential vulnerabilities in the software system could have helped prevent the incident [62872]. 3. Regularly updating and maintaining the software system to ensure it is up-to-date with the latest security patches and improvements could have reduced the likelihood of encountering such disruptions [62872].
Fixes 1. Implementing redundancy and failover mechanisms in the software system to ensure continuous operation even in the event of network issues [62872]. 2. Conducting a thorough review and testing of the software to identify and address any vulnerabilities or weaknesses that could lead to similar incidents in the future [62872]. 3. Enhancing communication and coordination between the software provider (Amadeus) and the airlines to ensure prompt identification and resolution of any issues that may arise in the future [62872].
References 1. Amadeus company statement [62872] 2. Star Alliance statement [62872] 3. Lufthansa statement [62872] 4. Air France statement [62872] 5. Swiss International Air Lines statement [62872] 6. Heathrow Airport spokesperson [62872] 7. Melbourne Airport spokesperson [62872] 8. Alex Macheras, air travel analyst [62872]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident has happened again at one_organization: The article mentions that American Airlines had suffered a glitch to its software, operated by Sabre, a rival to Amadeus, in 2013, which forced the airline to cancel hundreds of flights [62872]. (b) The software failure incident has happened again at multiple_organization: The article does not provide specific information about similar incidents happening at multiple organizations.
Phase (Design/Operation) design (a) The software failure incident in the article was related to the design phase. The incident was caused by a "network issue" affecting the Altea software developed by Amadeus, which is used by 189 airlines for managing customer reservations, tagging luggage, and issuing boarding passes [62872]. The issue was attributed to a problem with the system itself rather than its operation or misuse. (b) The software failure incident in the article was not related to the operation phase. There was no indication in the article that the failure was caused by the operation or misuse of the system. Instead, it was attributed to a network issue affecting the software's functionality during the design phase [62872].
Boundary (Internal/External) within_system (a) The software failure incident reported in the article was primarily within the system. The issue was attributed to a "network issue" that caused disruption to the systems using the Altea software developed by Amadeus [62872]. The technical teams identified the cause of the problem within their system and gradually restored services, indicating that the failure originated from within the software system itself.
Nature (Human/Non-human) non-human_actions (a) The software failure incident was attributed to a "network issue" according to Amadeus, the company behind the software program Altea [62872]. This indicates that the failure was due to non-human actions, specifically a technical issue within the network infrastructure. (b) The article did not mention any contributing factors introduced by human actions that led to the software failure incident.
Dimension (Hardware/Software) software (a) The software failure incident reported in the article was not attributed to hardware issues but rather to a "network issue" affecting the Altea software developed by Amadeus [62872]. (b) The software failure incident was specifically linked to a "network issue" affecting the Altea software developed by Amadeus, which caused disruptions in managing customer reservations, tagging luggage, and issuing boarding passes for multiple major airlines [62872].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident reported in Article 62872 was classified as non-malicious. The issue was attributed to a "network issue" by Amadeus, the company behind the software program Altea. The article mentioned that there were no signs indicating that the problem was caused by malicious intent. Additionally, the air travel analyst, Alex Macheras, stated that while there have been failures and glitches in the past, having such a widespread impact on a global scale was very unusual, indicating that the incident was not caused by malicious intent [62872].
Intent (Poor/Accidental Decisions) accidental_decisions The intent of the software failure incident in the reported article [62872] appears to align more with the category of accidental_decisions. The incident was attributed to a "network issue" by Amadeus, the company behind the software program Altea. There were no indications of malicious intent causing the problem, and the issue was described as a technical glitch that disrupted the system used by multiple major airlines worldwide. The article also mentions that failures of departure control systems, like the one affected in this incident, have occurred in the past due to glitches rather than intentional actions.
Capability (Incompetence/Accidental) accidental (a) The software failure incident reported in Article 62872 was not attributed to development incompetence. The issue was described as a "network issue" by Amadeus, the company behind the software program Altea. The company stated that technical teams had identified the cause of the problem and restored services, indicating that the failure was not due to incompetence in the development of the software. (b) The software failure incident in Article 62872 was categorized as an accidental failure. The disruption was caused by a "network issue" according to Amadeus, and there were no indications of malicious intent behind the problem. The incident was described as affecting airports around the world, demonstrating the accidental nature of the failure rather than intentional sabotage or incompetence in development.
Duration temporary The software failure incident described in the article was temporary. The article mentions that the issue with the Altea software developed by Amadeus was caused by a "network issue" [62872]. The technical teams were able to identify the cause of the problem and gradually restore services. By the afternoon in Europe, the company reported that its software was "functioning normally" after resolving the issue. Various airlines, including Lufthansa and Air France, experienced problems for a short duration resulting in flight delays but no cancellations, indicating a temporary nature of the software failure incident.
Behaviour crash, omission, other (a) crash: The software failure incident in the article was characterized by flights being delayed and travelers struggling to check in at airports around the world after the Altea software program used by several major airlines went down. This indicates a crash where the system lost its state and was not able to perform its intended functions [62872]. (b) omission: The article mentions that problems were reported at airports in Asia, Europe, and the Americas, demonstrating that the software omitted to perform its intended functions at multiple instances across different regions [62872]. (c) timing: The software failure incident resulted in delays for flights, indicating that the system was performing its intended functions but at the wrong time, causing disruptions in the travel schedules of passengers [62872]. (d) value: While the article does not explicitly mention the system performing its intended functions incorrectly, the delays and disruptions caused by the software failure incident can be attributed to the system not performing its functions correctly, impacting the value provided to customers [62872]. (e) byzantine: The article does not indicate any inconsistent responses or interactions by the system during the software failure incident. The issue seemed to be more related to a network problem causing disruptions rather than erratic behavior by the software [62872]. (f) other: The behavior of the software failure incident can also be described as causing intermittent issues for some airlines, resulting in delays but no cancellations for some carriers, and slowly coming back online for others. This behavior could be categorized as a combination of crash and omission, where the system partially failed to perform its functions leading to disruptions in airline operations [62872].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence delay The consequence of the software failure incident reported in the article was primarily related to delays experienced by travelers due to the software issue. Flights were delayed, and travelers struggled to check in at airports around the world as a result of the software program, Altea, going down [62872]. The delays were observed at airports in Asia, Europe, and the Americas, impacting the travel plans of passengers globally. Airlines such as Lufthansa, Air France, and Swiss International Air Lines reported delays in their flights due to the software failure [62872]. The Star Alliance also mentioned that customers on their network were affected by the software issue, although the problems were kept to a minimum [62872].
Domain transportation (a) The software failure incident affected the transportation industry as it disrupted the operations of major airlines like British Airways, Lufthansa, Thai Airways, and Air France, which rely on the Altea software developed by Amadeus to manage customer reservations and facilitate various processes at airports [62872]. (b) The transportation industry was significantly impacted by the software failure incident, leading to flight delays and check-in issues at airports worldwide, affecting travelers and causing disruptions in the movement of people and goods [62872]. (c) There is no specific mention of the natural resources industry being directly impacted by the software failure incident reported in the articles. (d) The software failure incident did not directly affect the sales industry or the exchange of money for products. (e) The construction industry was not directly involved in the software failure incident described in the articles. (f) The manufacturing industry was not directly linked to the software failure incident reported in the articles. (g) The utilities industry, which includes power, gas, steam, water, and sewage services, was not directly affected by the software failure incident. (h) The finance industry, which involves manipulating and moving money for profit, was not directly involved in the software failure incident described in the articles. (i) The knowledge industry, encompassing education, research, and space exploration, was not directly impacted by the software failure incident reported in the articles. (j) The health industry, covering healthcare, health insurance, and food industries, was not directly related to the software failure incident described in the articles. (k) The entertainment industry, including arts, sports, hospitality, and tourism, was not directly involved in the software failure incident reported in the articles. (l) The government industry, which includes politics, defense, justice, taxes, and public services, was not directly affected by the software failure incident. (m) The software failure incident was related to the airline industry, which falls under the transportation sector, and was not specifically categorized under the options provided.

Sources

Back to List