Incident: European Air Traffic Control System Failure Causes Widespread Flight Delays

Published Date: 2018-04-03

Postmortem Analysis
Timeline 1. The software failure incident affecting European air traffic occurred on the day the article was published, which was on April 3, 2018 [70122].
System 1. Enhanced Tactical Flow Management System [70122]
Responsible Organization 1. Eurocontrol [70122]
Impacted Organization 1. European air traffic co-ordination system (Eurocontrol) [70122]
Software Causes 1. The software cause of the failure incident was an error with the Enhanced Tactical Flow Management System, which helps to manage air traffic by comparing demand and capacity of different air traffic control sectors [70122].
Non-software Causes 1. Overload on the Enhanced Tactical Flow Management System due to managing up to 36,000 flights a day [70122]. 2. Unspecified problem with the system that led to the failure incident [70122].
Impacts 1. Up to half of European flights were delayed, affecting about 15,000 trips [70122]. 2. The capacity of the entire European air traffic network was deliberately reduced by 10% [70122]. 3. Major airports like Brussels, Schiphol, Helsinki, and Dublin experienced delays of varied lengths [70122]. 4. Eurocontrol's contingency plan was in place for several hours, causing disruptions to passengers and airlines [70122]. 5. Airlines were requested to resend any flight plans filed before 10:26 UTC as they were lost in the system failure [70122].
Preventions 1. Regular system maintenance and updates to ensure the stability and reliability of the Enhanced Tactical Flow Management System could have potentially prevented the software failure incident [70122]. 2. Implementation of more robust testing procedures before deploying system updates or changes could have helped identify and address any potential issues before they caused widespread disruptions [70122]. 3. Enhanced monitoring and alert systems to quickly detect anomalies or faults in the system could have allowed for proactive measures to be taken to prevent such a significant failure [70122].
Fixes 1. Restarting the faulty Enhanced Tactical Flow Management System was able to fix the software failure incident [70122].
References 1. Eurocontrol spokesperson 2. AFP news agency 3. Brussels Airport 4. Schiphol Airport 5. Helsinki Airport 6. Dublin Airport 7. Twitter content (optional) [70122]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident at Eurocontrol was mentioned to be only the second failure in 20 years, with the last similar incident happening in 2001 [70122]. This indicates that a similar incident had happened before within the same organization. (b) There is no specific mention in the provided article about a similar incident happening at other organizations or with their products and services.
Phase (Design/Operation) design (a) The software failure incident in the article was related to the design phase. The failure was attributed to an error with the Enhanced Tactical Flow Management System, which helps manage air traffic by comparing demand and capacity of different air traffic control sectors. The fault in the system led to widespread flight delays affecting up to half of all flights in Europe [70122]. The system failure was not due to the operation or misuse of the system but rather a fault within the system itself.
Boundary (Internal/External) within_system (a) The software failure incident related to the European air traffic co-ordination system was within the system. Eurocontrol mentioned that the fault was with the Enhanced Tactical Flow Management System, an internal system used to manage air traffic by comparing demand and capacity of different air traffic control sectors [70122]. The organization's contingency plan for a failure in the system deliberately reduced the capacity of the entire European network by 10% and added predetermined departure intervals at major airports, indicating that the impact was contained within the system itself [70122].
Nature (Human/Non-human) non-human_actions (a) The software failure incident was not directly attributed to human actions but rather to a fault in the Enhanced Tactical Flow Management System, a non-human factor. Eurocontrol mentioned that the unspecified problem was with this system, which helps manage air traffic by comparing demand and capacity of different air traffic control sectors [70122]. The system failure led to delays in European flights, affecting up to half of the scheduled trips, but Eurocontrol confirmed that air traffic control itself was not directly affected, and safety was not compromised at any time. (b) There is no specific mention in the articles of the software failure incident being caused by human actions. The focus was on the technical fault in the system and the subsequent delays in European flights.
Dimension (Hardware/Software) software (a) The software failure incident reported in the article was not attributed to hardware issues. The fault was specifically mentioned to be with the Enhanced Tactical Flow Management System, which is a software system used to manage air traffic by comparing demand and capacity of different air traffic control sectors [70122]. The article did not mention any hardware-related contributing factors to the system failure incident. (b) The software failure incident was directly linked to software issues. Eurocontrol, responsible for co-ordinating European air traffic, stated that the fault was with the Enhanced Tactical Flow Management System, a software system that helps manage air traffic [70122]. The system failure was due to a software issue within this specific system, leading to widespread flight delays affecting up to half of all flights in Europe.
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident related to the European air traffic co-ordination system was non-malicious. Eurocontrol stated that the fault was due to an unspecified problem with the Enhanced Tactical Flow Management System, which helps manage air traffic by comparing demand and capacity of different air traffic control sectors. The organization mentioned that safety was not compromised at any time during the incident, indicating that the failure was not caused by malicious intent [70122].
Intent (Poor/Accidental Decisions) unknown (a) The software failure incident related to the European air traffic co-ordination system was not attributed to poor decisions but rather to a technical fault in the Enhanced Tactical Flow Management System. Eurocontrol mentioned that the fault was with the system itself, and it was the second failure in 20 years, with the last one occurring in 2001. The organization implemented a contingency plan to manage the situation caused by the system failure, indicating that the incident was more of a technical issue rather than a result of poor decisions [70122].
Capability (Incompetence/Accidental) accidental (a) The software failure incident in the European air traffic co-ordination system does not seem to be related to development incompetence. The article does not mention any issues or errors caused by lack of professional competence by humans or the development organization. (b) The software failure incident in the European air traffic co-ordination system was accidental. The article mentions that the fault was with the Enhanced Tactical Flow Management System, which helps manage air traffic by comparing demand and capacity of different air traffic control sectors. The system failure was described as an unspecified problem that led to delays affecting up to half of all flights in Europe. Eurocontrol mentioned that it was the second failure in 20 years and that safety was not compromised at any time, indicating that the incident was accidental rather than due to development incompetence [70122].
Duration temporary The software failure incident related to the European air traffic co-ordination system was temporary. The fault in the Enhanced Tactical Flow Management System caused delays in European flights, affecting up to half of the scheduled trips. Eurocontrol was able to fix the fault by restarting the system, and normal operations resumed after the system was back online. The contingency plan implemented by Eurocontrol, which deliberately reduced the capacity of the European network by 10% and added predetermined departure intervals at major airports, indicates that the failure was temporary and specific to the system issue rather than a permanent and widespread issue [70122].
Behaviour crash, other (a) crash: The software failure incident in the article can be categorized as a crash. The system experienced a fault with the Enhanced Tactical Flow Management System, leading to widespread flight delays. The system had to be restarted to resume normal operations, indicating a loss of state and a failure to perform its intended functions [70122]. (b) omission: The system failure incident did not specifically mention any instances where the system omitted to perform its intended functions at a particular instance. The focus was more on the system fault and the resulting delays [70122]. (c) timing: The software failure incident does not align with a timing failure where the system performed its intended functions but at the wrong time. Instead, the issue was with the system itself and the delays it caused [70122]. (d) value: The failure was not related to the system performing its intended functions incorrectly in terms of the value provided. The fault led to delays, but the core issue was with the system itself rather than incorrect functionality [70122]. (e) byzantine: The software failure incident does not exhibit characteristics of a byzantine failure where the system behaves erroneously with inconsistent responses and interactions. The fault with the system led to delays, but there is no mention of inconsistent behavior or interactions [70122]. (f) other: The behavior of the software failure incident can be categorized as a system-wide disruption affecting the coordination of European air traffic. The fault in the Enhanced Tactical Flow Management System resulted in significant delays for thousands of flights, prompting the need for a system restart and contingency measures. The incident was described as a rare occurrence, with the system deliberately reducing network capacity and implementing predetermined departure intervals at major airports [70122].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence delay (e) delay: People had to postpone an activity due to the software failure The software failure incident led to widespread flight delays affecting up to half of all flights in Europe, with about 15,000 trips being impacted [70122]. Several European airports warned passengers to expect delays, with Brussels Airport limiting departures to just 10 every hour, and other airports like Schiphol in Amsterdam, Helsinki, and Dublin also experiencing delays of varied lengths [70122]. Eurocontrol's contingency plan deliberately reduced the capacity of the entire European network by 10% and added predetermined departure intervals at major airports due to the system failure [70122].
Domain transportation (a) The failed system was intended to support the transportation industry, specifically European air traffic coordination. The Enhanced Tactical Flow Management System managed up to 36,000 flights a day in Europe [70122].

Sources

Back to List