Incident: Air Traffic Control System Glitch Disrupts Thousands of Flights

Published Date: 2013-07-09

Postmortem Analysis
Timeline 1. The software failure incident reported in Article 20547 happened on July 9, 2013. [20547] 2. The software failure incident reported in Article 55973 happened on December 7, 2013. [55973]
System 1. Night-time operating system at Swanwick control centre [55973] 2. Daytime operating system at Swanwick control centre [55973] 3. Internal telephones system at Swanwick control centre [55973]
Responsible Organization 1. National Air Traffic Services (NATS) - Article 20547 2. National Air Traffic Service's headquarters in Swanwick, Hampshire - Article 55973
Impacted Organization 1. Airline passengers [20547] 2. National Air Traffic Services (NATS) [20547] 3. European air traffic controllers at Eurocontrol [55973]
Software Causes 1. A computer glitch was to blame for the disruption at Britain's main air traffic control centre, causing delays for thousands of airline passengers [20547]. 2. A temporary problem with the telephone computer system at the National Air Traffic Service's headquarters in Swanwick, Hampshire caused delays for over 1,300 flights, due to a complex technical glitch during a shift changeover [55973].
Non-software Causes 1. Shift handover issue between night shift and day shift controllers at Swanwick control centre [55973] 2. Communication problem with the internal telephones at the control centre [55973]
Impacts 1. Thousands of airline passengers faced flight chaos with delays due to the software failure incident at Britain's main air traffic control center [20547]. 2. The software failure incident led to restrictions on the number of aircraft flying across the south of England and those taking off from airports [20547]. 3. The software failure incident caused delays of up to 20 minutes for most flights [20547]. 4. Over 1,300 flights across Britain and Ireland were delayed due to the software failure incident [55973]. 5. The software failure incident affected major airports like London's Heathrow, Gatwick, and Stansted [55973]. 6. The issue lasted for 14 hours before being fixed, causing significant disruption to air travel [55973].
Preventions 1. Implementing thorough testing procedures before deploying software updates or changes could have potentially prevented the software failure incident [20547]. 2. Conducting regular maintenance and monitoring of the software systems to identify and address any potential issues before they escalate into major disruptions [55973]. 3. Having robust contingency plans in place to quickly address and mitigate any technical glitches or failures that may arise [55973].
Fixes 1. Implementing thorough testing procedures to identify and address software issues before they impact operations [20547]. 2. Conducting regular maintenance and updates on the software system to prevent glitches and faults [20547]. 3. Enhancing communication and coordination between different shifts of controllers to ensure smooth transitions and prevent communication problems [55973]. 4. Developing robust contingency plans to handle unexpected technical glitches and minimize disruptions to air traffic [55973].
References 1. National Air Traffic Services (NATS) spokesperson [20547] 2. Southampton Airport [20547] 3. Heathrow Airport [20547] 4. Gatwick Airport [20547] 5. European air traffic controllers at Eurocontrol [55973] 6. Ryanair [55973] 7. Civil Aviation Authority [55973] 8. British Airways [55973]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident having happened again at one_organization: The article [55973] reports that Ryanair raged at the Air Traffic Service after a computer glitch that delayed 1,300 flights. This incident occurred at the National Air Traffic Service (NATS) headquarters in Swanwick, Hampshire, which is the same organization that experienced a previous software issue as mentioned in article [20547]. (b) The software failure incident having happened again at multiple_organization: The article [55973] mentions that a 'complex' technical glitch affected 1,300 flights across Britain and Ireland. This incident indicates that similar software failure incidents have occurred at multiple organizations, not just at NATS.
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase can be seen in Article 20547, where a computer glitch at the air traffic control center in Swanwick, Hampshire, caused delays for thousands of airline passengers. The article mentions that the disruption was caused by a software issue that had been rectified. This indicates that the failure was due to contributing factors introduced during the system development or updates [20547]. (b) The software failure incident related to the operation phase is evident in Article 55973, where a technical glitch at the National Air Traffic Service's headquarters in Swanwick, Hampshire, led to delays for over 1,300 flights. The issue occurred during a shift change when the night-time operating system did not properly switch over to the daytime system, causing a communication problem with the internal telephones. This highlights a failure due to contributing factors introduced by the operation or misuse of the system [55973].
Boundary (Internal/External) within_system (a) The software failure incident reported in the articles was primarily within the system. In Article 20547, the incident was attributed to a computer glitch at the air traffic control center, which was a software issue that had been rectified [20547]. Similarly, in Article 55973, the technical glitch that caused delays in over 1,300 flights was related to a temporary problem with the telephone computer system at the National Air Traffic Service's headquarters in Swanwick, Hampshire [55973]. These incidents point to internal system issues leading to the software failures.
Nature (Human/Non-human) non-human_actions (a) The software failure incident occurring due to non-human actions: - Article 20547 reports a software issue at the air traffic control center in Swanwick, Hampshire, which was caused by a computer glitch. NATS mentioned that it was 'a software issue that has now been rectified' [20547]. - Article 55973 describes a technical glitch at the National Air Traffic Service's headquarters in Swanwick, Hampshire, which caused chaos at airports across the country. The issue was related to the telephone computer system not properly switching over from the night-time operating system to the daytime system, causing a communication problem with internal telephones [55973]. (b) The software failure incident occurring due to human actions: - There is no specific mention in the articles about the software failure incidents being caused by human actions.
Dimension (Hardware/Software) software (a) The software failure incident occurring due to hardware: - There is no specific mention in the articles about the software failure incident being caused by hardware issues [20547, 55973]. (b) The software failure incident occurring due to software: - The software failure incidents in both articles were attributed to software issues. In Article 20547, the disruption at the air traffic control centre was caused by a computer glitch, described as 'a software issue that has now been rectified.' Similarly, in Article 55973, a 'temporary problem with the telephone computer system' at the National Air Traffic Service's headquarters caused delays for over 1,300 flights. The statement from Nats in Article 55973 also emphasized that the problem was related to a complex software system with more than a million lines of software [20547, 55973].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident reported in Article 20547 was non-malicious. The incident was attributed to a computer glitch at the air traffic control center, which was described as a 'software issue that has now been rectified' by NATS [20547]. (b) In contrast, the software failure incident reported in Article 55973 was also non-malicious. The technical glitch that caused delays to 1,300 flights was due to a temporary problem with the telephone computer system at the National Air Traffic Service's headquarters in Swanwick, Hampshire. The issue was described as a 'complex' technical glitch that affected the communication system with internal telephones at the center [55973].
Intent (Poor/Accidental Decisions) unknown (a) The software failure incident reported in Article 55973 was related to poor_decisions. The article mentions that a temporary problem with the telephone computer system at the National Air Traffic Service's headquarters in Swanwick, Hampshire caused mayhem at airports across the country, leading to delays in over 1,300 flights. The issue occurred when the night-time operating system did not properly switch over to the daytime system, causing a communication problem with the internal telephones [55973]. (b) The software failure incident reported in Article 20547 was related to accidental_decisions. The article mentions that a computer glitch was to blame for the disruption at the air traffic control center, and it was described as 'a software issue that has now been rectified.' The incident led to delays for thousands of airline passengers, but safety was maintained as the airspace was not closed [20547].
Capability (Incompetence/Accidental) accidental (a) The software failure incidents reported in the articles seem to be more related to accidental factors rather than development incompetence. In the incident reported in Article 20547, a computer glitch was to blame for the disruption at the air traffic control center, which was described as a 'software issue that has now been rectified' [20547]. Similarly, in Article 55973, a 'complex' technical glitch at the National Air Traffic Service's headquarters in Swanwick, Hampshire caused mayhem at airports across the country, leading to delays in over 1,300 flights [55973]. These incidents point more towards accidental technical glitches rather than failures due to development incompetence.
Duration temporary (a) The software failure incident reported in Article 55973 was temporary. It caused mayhem at airports across the UK, with more than 1,300 flights delayed due to a temporary problem with the telephone computer system at the National Air Traffic Service's headquarters in Swanwick, Hampshire. The issue began early on Saturday morning and was finally fixed after 14 hours [55973]. (b) The software failure incident reported in Article 20547 was also temporary. It was caused by a computer glitch at the hi-tech air traffic control center at Swanwick, near Southampton, in Hampshire. The problem led to restrictions on the number of aircraft flying across the south of England and those taking off from airports. NATS mentioned that the software issue had been rectified, and most delays were no more than 20 minutes [20547].
Behaviour crash, omission, other (a) crash: - Article 20547 reports a software issue at the air traffic control center that led to delays in flights. The article mentions a computer glitch as the cause of the disruption, which can be indicative of a crash where the system loses state and fails to perform its intended functions [20547]. (b) omission: - Article 55973 describes a technical glitch at the National Air Traffic Service's headquarters in Swanwick, Hampshire, which caused delays in over 1,300 flights. The issue occurred during a shift change when the night-time operating system did not properly switch over to the daytime system, leading to a communication problem with internal telephones. This omission to switch systems caused delays in flights [55973]. (c) timing: - The articles do not specifically mention a timing-related failure where the system performs its intended functions but at the wrong time. (d) value: - The articles do not mention a value-related failure where the system performs its intended functions incorrectly. (e) byzantine: - The articles do not mention a byzantine-related failure where the system behaves erroneously with inconsistent responses and interactions. (f) other: - The behavior of the software failure incidents reported in the articles can be categorized as a combination of crash and omission. The system crashed due to a computer glitch at the air traffic control center, leading to delays in flights. Additionally, the omission of properly switching systems during a shift change also contributed to the disruption in air traffic control operations [20547, 55973].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property, delay, non-human (a) death: People lost their lives due to the software failure - No information about any deaths due to the software failure incident was mentioned in the articles [20547, 55973]. (b) harm: People were physically harmed due to the software failure - No information about physical harm to individuals due to the software failure incident was provided in the articles [20547, 55973]. (c) basic: People's access to food or shelter was impacted because of the software failure - No information about people's access to food or shelter being impacted due to the software failure incident was mentioned in the articles [20547, 55973]. (d) property: People's material goods, money, or data was impacted due to the software failure - The software failure incident caused delays and disruptions to thousands of airline passengers, affecting their travel plans and potentially leading to financial losses for airlines [20547, 55973]. (e) delay: People had to postpone an activity due to the software failure - The software failure incident resulted in delays for thousands of air travelers, with flights being affected across airports in the UK, causing chaos and inconvenience for passengers [20547, 55973]. (f) non-human: Non-human entities were impacted due to the software failure - The software failure incident impacted the operations of National Air Traffic Services (NATS) and air traffic control systems, leading to restrictions on the number of aircraft flying across the south of England and causing delays in flight schedules [20547, 55973]. (g) no_consequence: There were no real observed consequences of the software failure - The software failure incident had real observed consequences, including flight delays, disruptions, and chaos for airline passengers [20547, 55973]. (h) theoretical_consequence: There were potential consequences discussed of the software failure that did not occur - No theoretical consequences were discussed in the articles [20547, 55973]. (i) other: Was there consequence(s) of the software failure not described in the (a to h) options? What is the other consequence(s)? - No other specific consequences beyond the ones mentioned in options (d) and (e) were detailed in the articles [20547, 55973].
Domain transportation (a) The failed system was related to the transportation industry, specifically air traffic control. The incidents reported in the articles describe how technical glitches at the National Air Traffic Services (NATS) control center in Swanwick, Hampshire, caused delays and disruptions to thousands of flights across the UK and Ireland [20547, 55973]. The system affected the movement of aircraft and the coordination of air traffic, highlighting its role in supporting the transportation of people by air.

Sources

Back to List