Incident: Air Traffic Control System Failure at London Airports, December 2014

Published Date: 2014-12-12

Postmortem Analysis
Timeline 1. The software failure incident happened on December 12, 2014 [56251, 56658]. 2. The incident occurred on December 12, 2014.
System 1. Computer workstations used by controllers transitioning between states caused a failure in the system [Article 56254]. 2. Flight data system experienced a problem [Article 56658].
Responsible Organization 1. Nats, the company responsible for controlling British airspace, was responsible for causing the software failure incident [56254, 56251]. 2. The air traffic management company NATS confirmed the technical failure that led to disruptions in London airspace [56658].
Impacted Organization 1. Passengers at UK airports, including Heathrow, Gatwick, and others, experienced delays, cancellations, and disruptions in their flights [56251, 56658]. 2. Airlines such as British Airways, easyJet, and Ryanair had to cancel flights and experienced delays in their operations [56251, 56658]. 3. Air traffic controllers at the main national air traffic control center in Swanwick, UK, were affected by the system failure incident [56251, 56658]. 4. The UK air traffic management company NATS, responsible for controlling British airspace, was impacted by the software failure incident [56254].
Software Causes 1. Transition between different states of computer workstations used by controllers causing a failure in the system, leading to the inability to access all data regarding individual flight plans [Article 56254]. 2. Glitch in some of the new upgraded software possibly introducing a failure [Article 56251].
Non-software Causes 1. The failure was connected to the number of computer workstations used by controllers being "in a certain state" combined with the number of airspace sectors open at the time [56254]. 2. The incident was caused by an unprecedented systems failure at the national air traffic control center in Swanwick, Hampshire [56254].
Impacts 1. Flights were canceled, delayed, and grounded at some of the UK's busiest airports, including Heathrow, Gatwick, Stansted, and Luton, causing significant disruptions for passengers [56251, 56658]. 2. Passengers experienced hours of delays, with some flights taking off up to four hours late [56251]. 3. Airports as far north as Aberdeen and Edinburgh were affected by the computer problem [56251]. 4. Hundreds of international flights were diverted or disrupted due to the software failure incident [56251]. 5. Baggage claim operations were affected at Heathrow, with hundreds of unclaimed bags sitting beside carousels [56658]. 6. Flights were diverted to other airports, causing challenges in parking, refueling, servicing, and handling passengers at unexpected locations [56658]. 7. British Airways canceled multiple services, and other airlines like easyJet reported delays and cancellations [56251, 56658]. 8. The disruption had international consequences, affecting flights from Paris Charles de Gaulle and Tunisia, among others [56658]. 9. The incident exposed vulnerabilities in the air traffic system, leading to concerns about the impact of technical errors on the aviation industry [56658]. 10. The disruption caused a major breakdown in UK air traffic control, prompting investigations and calls for accountability from authorities and aviation experts [56254].
Preventions 1. Proper testing and validation of the upgraded software to identify and fix any potential glitches before deployment could have prevented the software failure incident [56251]. 2. Implementing robust contingency plans and redundancies in the air traffic control system to ensure that if one part of the system fails, other parts can take over smoothly to prevent widespread disruptions [56251]. 3. Conducting regular maintenance and monitoring of the air traffic control system to detect and address any issues proactively before they escalate into major failures [56658]. 4. Enhancing the training and preparedness of air traffic controllers to handle situations where computer-based tools are unavailable, ensuring they can manage aircraft safely even without full system support [56251]. 5. Collaborating with international aviation authorities to address vulnerabilities in the air traffic system configuration and implement control measures to prevent similar incidents in the future [56658].
Fixes 1. Transition between certain states causing a failure in the system needs to be addressed to prevent similar incidents in the future [56254]. 2. Investigating and resolving the issue related to the number of computer workstations used by controllers and the number of airspace sectors open at the time [56254]. 3. Implementing measures to ensure controllers have access to all data regarding individual flight plans to reduce their workload and prevent system failures [56254]. 4. Enhancing the software to maintain a safe operation for the flying public and prevent safety compromises during incidents [56254]. 5. Conducting a thorough digital forensic effort to identify vulnerabilities in the air traffic system and address them urgently [56658].
References 1. Air traffic management company NATS [56251, 56658, 56254] 2. Heathrow Airport [56251, 56658, 56254] 3. Eurocontrol [56658] 4. British Airways [56251, 56658, 56254] 5. Budget flier easyJet [56251] 6. Gatwick Airport [56251, 56658, 56254] 7. Stansted Airport [56251, 56254] 8. Luton Airport [56251, 56254] 9. London's Heathrow Airport spokesperson, Marianna Panizza [56658] 10. Paris Charles de Gaulle airport representative [56658] 11. United Airlines spokesperson, Mary Ryan [56658] 12. Delta Air Lines spokesperson [56658] 13. Aviation security expert Glenn Schoen [56658] 14. British Secretary of State for Transport, Patrick McLoughlin [56251, 56254] 15. Louise Ellman, chair of the Commons transport select committee [56254]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident having happened again at one_organization: - The incident at the air traffic control center in Swanwick, Hampshire, was not the first time such a failure occurred. A computer problem affected operations at Swanwick for almost 12 hours the previous December, leaving thousands of passengers stranded as hundreds of flights were grounded [Article 56251]. - Nats, the company responsible for controlling British airspace, mentioned that the recent incident was connected to the number of computer workstations used by controllers being "in a certain state," combined with the number of airspace sectors open at the time. This transition between states caused a failure in the system that had not been seen before [Article 56254]. (b) The software failure incident having happened again at multiple_organization: - The budget flier easyJet had to cancel flights to and from London Gatwick due to the system failure incident. They also anticipated delays for other flights to and from the south of the UK [Article 56251]. - British Airways had to cancel several services from Heathrow, including flights to European destinations like Stockholm, Dublin, Madrid, and Zurich, due to the disruption caused by the software failure incident. They offered refunds or rebooking options to passengers not wishing to travel on that day [Article 56658].
Phase (Design/Operation) design (a) The software failure incident in the articles was primarily related to the design phase. The incident was attributed to a technical failure at the main national air traffic control center in Swanwick, leading to a system failure that affected London airspace and caused delays and cancellations at various airports [56251, 56658, 56254]. Experts suggested that a failure in some of the new upgraded software may have introduced a glitch, impacting the upper airspace sector and causing disruptions in the system [56251]. NATS, the company responsible for controlling British airspace, explained that the incident was due to a transition between two states causing a failure in the system, which had not been seen before, affecting the controllers' access to flight plan data and increasing their workload [56254]. (b) The software failure incident was not primarily related to the operation phase in the articles.
Boundary (Internal/External) within_system, outside_system From the provided articles, the software failure incident related to the air traffic control system in London and south-east England had contributing factors both within and outside the system. Within_system: - The failure was connected to the number of computer workstations used by controllers being "in a certain state," combined with the number of airspace sectors open at the time, causing a transition between states that led to a system failure [Article 56254]. - The failure resulted in controllers being unable to access all the data regarding individual flight plans, significantly increasing their workload [Article 56254]. - NATS emphasized that safety was not compromised during the incident, and controllers had a full radar picture and full communications with all aircraft at all times [Article 56254]. Outside_system: - The incident was not due to a power outage, as confirmed by NATS [Article 56658]. - The disruption caused by the failure led to delays and cancellations for travelers passing through London, impacting flights connecting through London airports and causing disruptions for hours [Article 56658]. - The incident had international consequences, affecting flights from other countries scheduled to fly to London [Article 56658].
Nature (Human/Non-human) non-human_actions (a) The software failure incident occurring due to non-human actions: - The software failure incident at the air traffic control center in Swanwick was attributed to an unprecedented systems failure caused by a transition between certain states in the computer workstations used by controllers, combined with the number of airspace sectors open at the time. This transition led to a failure in the system that had not been seen before, affecting the controllers' access to all data regarding individual flight plans [Article 56254]. - The system failure at the main national air traffic control center in Swanwick led to flight cancellations and delays at UK airports, grounding planes and causing hours of delays and cancellations. The failure was not due to a power outage but was related to a technical fault in the system [Article 56251]. - The UK air traffic management company NATS confirmed that the air traffic control system for London airspace experienced a technical failure, resulting in disruptions and flight cancellations. NATS ruled out a power outage as the cause of the fault, indicating that the issue was related to a computer failure [Article 56658]. (b) The software failure incident occurring due to human actions: - There is no specific mention in the articles of the software failure incident being directly caused by human actions. The focus is primarily on technical faults, system failures, and transitions between states leading to the failure in the air traffic control system. Therefore, there is no clear indication of human actions contributing to the software failure incident reported in the articles.
Dimension (Hardware/Software) hardware, software (a) The software failure incident occurring due to hardware: - The incident was not due to a power outage as initially suspected, ruling out a hardware-related issue [56658]. - The system failure at the air traffic control center was connected to the number of computer workstations used by controllers being "in a certain state," indicating a potential hardware-related contributing factor [56254]. (b) The software failure incident occurring due to software: - Experts suggested that a failure in some of the new upgraded software may have introduced a glitch, indicating a software-related contributing factor [56251]. - Nats explained that the incident was caused by a transition between two states that caused a failure in the system, which has not been seen before, pointing to a software-related issue [56254].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident was non-malicious. The incident was attributed to a technical failure in the air traffic control system, specifically related to a problem with the number of computer workstations used by controllers and the transition between different states causing a failure in the system [Article 56254]. The company responsible for controlling British airspace, NATS, emphasized that safety was not compromised during the incident and that controllers had full radar picture and communications with all aircraft at all times [Article 56254]. (b) The software failure incident was non-malicious. The incident was described as a technical failure at the main national air traffic control center in Swanwick, leading to flight cancellations, delays, and restrictions in London airspace [Article 56251]. NATS confirmed that the system failure was not due to a power outage and that the airspace was closed due to a computer problem [Article 56658]. The disruption caused significant delays and cancellations at various airports, affecting passengers and airlines [Article 56251, Article 56658].
Intent (Poor/Accidental Decisions) unknown From the provided articles, the software failure incident related to the air traffic control system in London and south-east England was not explicitly attributed to poor decisions or accidental decisions. The incidents were primarily described as technical failures or system glitches that led to disruptions in air traffic control operations. The articles highlighted issues with the computer workstations, transitions between different states, and the number of airspace sectors open at the time as contributing factors to the failures. The focus was more on the technical aspects of the incident rather than on specific decisions made by individuals or organizations.
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident occurring due to development incompetence: - Article 56254 mentions that the air traffic control chaos was caused by an unprecedented systems failure at the national center in Swanwick, Hampshire, due to a transition between two states causing a failure in the system that had not been seen before. This failure was connected to the number of computer workstations used by controllers being "in a certain state," indicating a potential issue related to the software development or system configuration [56254]. (b) The software failure incident occurring accidentally: - Article 56251 reports that experts suggested a failure in some of the new upgraded software may have introduced a glitch, indicating that the incident could have been an accidental consequence of software upgrades or changes [56251].
Duration temporary The software failure incident related to the air traffic control system in London airspace was temporary. The incident was caused by a technical failure in the system, specifically a problem with the flight data system, which resulted in the temporary closure of London's airspace [56658]. The issue was related to a transition between two states in the system that caused a failure, leading to delays, cancellations, and disruptions in flights [56251, 56254]. The system failure was not permanent as it was eventually resolved, and operations started returning to normal after the incident was addressed [56658].
Behaviour crash, omission, other (a) crash: The software failure incident in the articles can be categorized as a crash. The incident led to a system failure at the main national air traffic control center in Swanwick, resulting in severe disruptions to air traffic control operations, delays, and cancellations of flights at various airports [56251]. The failure caused a transition between two states in the system, leading to a failure that had not been seen before, impacting the controllers' ability to access all data regarding individual flight plans [56254]. (b) omission: The software failure incident can also be categorized as an omission. The failure resulted in controllers being unable to access all data regarding individual flight plans, significantly increasing their workload and leading to the omission of performing their intended functions efficiently [56254]. (c) timing: The software failure incident does not align with a timing failure as the system did not perform its intended functions too late or too early. The primary issue was the system's failure to function correctly, leading to disruptions and delays in air traffic control operations [56251, 56658, 56254]. (d) value: The software failure incident does not align with a value failure as the system did not perform its intended functions incorrectly. The main issue was the system's failure, which impacted the controllers' ability to access critical data, leading to disruptions in air traffic control operations [56251, 56658, 56254]. (e) byzantine: The software failure incident does not align with a byzantine failure as there were no indications of inconsistent responses or interactions from the system. The primary issue was the system failure at the air traffic control center, leading to disruptions and delays in air traffic operations [56251, 56658, 56254]. (f) other: The software failure incident can be categorized as a system failure due to a glitch in some of the new upgraded software, which introduced the error. The failure impacted the upper airspace sector, leading to disruptions in air traffic control operations [56251].

IoT System Layer

Layer Option Rationale
Perception processing_unit, embedded_software (a) sensor: The articles do not mention any specific sensor-related failures that contributed to the air traffic control system failure incident. [56251, 56658, 56254] (b) actuator: The articles do not mention any specific actuator-related failures that contributed to the air traffic control system failure incident. [56251, 56658, 56254] (c) processing_unit: The failure in the air traffic control system was related to the processing unit. According to Article 56254, the incident at the national air traffic control center in Swanwick was caused by a failure in the system due to a transition between two states that caused a failure in the system, leading to controllers being unable to access all data regarding individual flight plans, significantly increasing their workload. (d) network_communication: The articles do not mention any specific network communication-related failures that contributed to the air traffic control system failure incident. [56251, 56658, 56254] (e) embedded_software: The failure in the air traffic control system was potentially related to the embedded software. Article 56251 mentions that experts suggested a failure in some of the new upgraded software may have introduced a glitch, contributing to the incident. Additionally, Article 56254 states that the incident at the national air traffic control center in Swanwick was connected to the number of computer workstations used by controllers being "in a certain state," which could indicate a software-related issue.
Communication unknown The software failure incident related to the air traffic control system in London and south-east England was not directly related to the communication layer of the cyber physical system that failed. The failure was attributed to a technical issue within the system itself, specifically related to the number of computer workstations used by controllers transitioning between states, which caused a failure in the system that had not been seen before [Article 56254]. The incident resulted in controllers being unable to access all the data regarding individual flight plans, significantly increasing their workload, but safety was not compromised as controllers had full radar picture and communications with all aircraft at all times [Article 56254]. The failure was not due to a power outage and was not caused by a hack [Article 56658].
Application FALSE The software failure incident related to the air traffic control system in London and south-east England was not specifically attributed to the application layer of the cyber physical system. The articles mention that the failure was due to a technical issue at the national air traffic control center in Swanwick, Hampshire, which resulted in a failure in the system that had not been seen before. The problem was connected to the number of computer workstations used by controllers being "in a certain state," combined with the number of airspace sectors open at the time, leading to a failure in the system that impacted the controllers' ability to access all data regarding individual flight plans [Article 56254]. The incident was described as an unprecedented systems failure that caused dozens of flights to be canceled and delayed, emphasizing that safety was not compromised during the incident [Article 56254]. Therefore, based on the information available, it is unknown whether the failure was specifically related to the application layer of the cyber physical system.

Other Details

Category Option Rationale
Consequence property, delay, non-human, other (a) death: People lost their lives due to the software failure - No information about any deaths caused by the software failure incident was mentioned in the articles [56251, 56658, 56254]. (b) harm: People were physically harmed due to the software failure - No information about people being physically harmed due to the software failure incident was mentioned in the articles [56251, 56658, 56254]. (c) basic: People's access to food or shelter was impacted because of the software failure - No information about people's access to food or shelter being impacted due to the software failure incident was mentioned in the articles [56251, 56658, 56254]. (d) property: People's material goods, money, or data was impacted due to the software failure - The software failure incident led to flight cancellations, delays, and disruptions affecting passengers' travel plans and causing inconvenience [56251, 56658, 56254]. (e) delay: People had to postpone an activity due to the software failure - Passengers experienced hours of delays, cancelled flights, and disruptions at various airports due to the software failure incident [56251, 56658, 56254]. (f) non-human: Non-human entities were impacted due to the software failure - The software failure incident affected the air traffic control system, leading to flight cancellations, delays, and disruptions at airports [56251, 56658, 56254]. (g) no_consequence: There were no real observed consequences of the software failure - The software failure incident had real observed consequences such as flight cancellations, delays, disruptions, and inconvenience to passengers [56251, 56658, 56254]. (h) theoretical_consequence: There were potential consequences discussed of the software failure that did not occur - The articles did not mention any potential consequences discussed that did not occur as a result of the software failure incident [56251, 56658, 56254]. (i) other: Was there consequence(s) of the software failure not described in the (a to h) options? What is the other consequence(s)? - The software failure incident resulted in the closure of runways at Heathrow and Gatwick, cancellations of multiple flights, delays in flight schedules, and disruptions to air travel operations [56251, 56658, 56254].
Domain transportation, finance, government (a) The failed system was intended to support the transportation industry, specifically air traffic control for London airspace [56251, 56658, 56254]. (h) The system failure incident was related to the finance industry indirectly as it impacted airlines, airports, and passengers, leading to financial implications such as flight cancellations, delays, and disruptions [56251, 56658, 56254]. (l) The government sector was directly affected by the software failure incident as it involved the national air traffic control center in Swanwick, which is a critical component of the UK's air traffic management system [56251, 56658, 56254].

Sources

Back to List