Incident: Air Traffic Control Communication System Failure at UK Airports

Published Date: 2013-12-07

Postmortem Analysis
Timeline 1. The software failure incident happened on Saturday, as mentioned in the article [55564]. 2. Published on 2013-12-07 08:00:00+00:00. 3. The incident occurred on December 7, 2013.
System 1. Ground communications system in the area control operations room at Nats Swanwick [55564] 2. Communication systems used by controllers to speak to other air traffic control agencies in the UK and Europe [55564]
Responsible Organization 1. National Air Traffic Services (Nats) control centre in Swanwick, Hampshire was responsible for causing the software failure incident [55564].
Impacted Organization 1. National Air Traffic Services' (Nats) control centre in Swanwick, Hampshire [Article 55564] 2. Heathrow, Stansted, and Gatwick airports [Article 55564] 3. Luton, Southampton, London City, Cardiff, Bristol, Bournemouth, Edinburgh, Glasgow, and Dublin airports [Article 55564] 4. Passengers at Heathrow airport [Article 55564] 5. Passengers at Gatwick airport [Article 55564] 6. Passengers at Stansted airport [Article 55564] 7. Ryanair flights [Article 55564] 8. Cardiff airport [Article 55564]
Software Causes 1. The software failure incident at the UK's major airports was caused by technical problems with the ground communications system in the area control operations room at Nats Swanwick, Hampshire, which deals with air traffic in the south of England [Article 55564].
Non-software Causes 1. The technical problems were caused by a ground communications system issue at the National Air Traffic Services' control center in Swanwick, Hampshire, during a switch from night to daytime operations [Article 55564].
Impacts 1. Thousands of passengers were affected by the glitch at the National Air Traffic Services' control center in Swanwick, resulting in a fifth of all flights being canceled at Heathrow, Stansted, and Gatwick airports, with others facing delays of up to two hours [Article 55564]. 2. Passengers at Heathrow airport experienced delays of more than an hour on many morning departures, and several short-haul flights were canceled to destinations like Nice, Milan, and Istanbul [Article 55564]. 3. Gatwick airport reported that 20% of its flights faced delays of up to two hours [Article 55564]. 4. Stansted airport experienced delays of between 30 minutes and two hours for all departing flights [Article 55564]. 5. Disruption was expected to continue as airports worked to clear the backlog caused by the software failure incident [Article 55564].
Preventions 1. Implementing more robust testing procedures to catch potential communication system failures during operational transitions [55564]. 2. Conducting regular maintenance and updates on the complex software system to prevent unexpected glitches during critical operational times [55564]. 3. Enhancing the system's redundancy and backup capabilities to ensure continuity of operations in case of software failures [55564].
Fixes 1. Implementing software updates or patches to address the communication system failure during the switch from night to daytime operations [Article 55564]. 2. Conducting a thorough review and potential redesign of the ground communications system software to prevent similar incidents in the future [Article 55564]. 3. Enhancing the operational contingency measures to ensure smoother transitions and minimize disruptions in case of software failures [Article 55564].
References 1. National Air Traffic Services' (Nats) control centre in Swanwick, Hampshire [Article 55564] 2. Nats spokesman [Article 55564] 3. Airport Operators' Association [Article 55564] 4. Heathrow airport spokeswoman [Article 55564] 5. Gatwick airport spokeswoman [Article 55564] 6. Stansted airport spokesman [Article 55564] 7. Ryanair [Article 55564] 8. Chris Yates, aviation expert [Article 55564] 9. Passenger Daisy McAndrew [Article 55564] 10. Cardiff airport spokesman [Article 55564]

Software Taxonomy of Faults

Category Option Rationale
Recurring multiple_organization (a) The software failure incident having happened again at one_organization: The article does not mention any previous incidents of software failure within the same organization (Nats) or with its products and services. Therefore, there is no evidence of a similar incident happening again at this specific organization [55564]. (b) The software failure incident having happened again at multiple_organization: The article mentions that the software failure incident affected not only major airports like Heathrow, Stansted, and Gatwick but also other airports such as Luton, Southampton, London City, Cardiff, Bristol, Bournemouth, Edinburgh, Glasgow, and Dublin [55564]. This indicates that the issue was widespread across multiple organizations in the aviation industry, suggesting a broader impact beyond just one organization.
Phase (Design/Operation) design, operation (a) The software failure incident in the article was primarily attributed to a technical problem with the ground communications system at the National Air Traffic Services' control center in Swanwick, Hampshire. The issue arose during a switch from night to daytime operations when communication systems failed to adjust, indicating a failure related to contributing factors introduced by system development or updates [55564]. (b) The operation of the system was impacted by the software failure incident, leading to cancellations and delays at major UK airports. Passengers at Heathrow, Gatwick, Stansted, and other airports faced significant disruptions due to the glitch in the air traffic control system, highlighting a failure related to contributing factors introduced by the operation of the system [55564].
Boundary (Internal/External) within_system, outside_system (a) The software failure incident at the UK's major airports was primarily within the system. The article mentions that the glitch occurred in the ground communications system at the National Air Traffic Services' control center in Swanwick, Hampshire, which deals with air traffic in the south of England [55564]. The issue was related to the communication systems failing to adjust during the switch from night to daytime operations, indicating an internal system failure. The complexity of the system, with over a million lines of software, contributed to the challenge faced by the engineering team and the manufacturer in resolving the problem swiftly while maintaining a safe service. (b) Additionally, external factors such as the switch from night to daytime operations could have played a role in the software failure incident. The article mentions that problems arose during this transition, suggesting that factors external to the system, such as environmental changes or operational adjustments, may have contributed to the failure [55564].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in the article was primarily attributed to technical problems with the ground communications system at the National Air Traffic Services' control center in Swanwick, Hampshire. The glitch occurred during a switch from night to daytime operations when the communication systems failed to adjust, leading to cancellations and delays at major UK airports [Article 55564]. (b) Human actions were also involved in addressing the software failure incident. The article mentions that the engineering team and the manufacturer worked closely to resolve the complex problem as quickly as possible while maintaining a safe service. Additionally, the Airport Operators' Association advised travelers to contact their airlines directly for flight information, indicating human intervention in managing the situation [Article 55564].
Dimension (Hardware/Software) software (a) The software failure incident reported in the articles was not attributed to hardware issues. The incident was specifically mentioned to have been caused by technical problems with the ground communications system at the National Air Traffic Services' control center in Swanwick, Hampshire, which deals with air traffic in the south of England. The issue was related to the communication systems failing to adjust during the switch from night to daytime operations, indicating a software-related problem [55564].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident reported in the articles does not indicate any malicious intent or actions contributing to the failure. The incident was described as a technical problem with the ground communications system at the National Air Traffic Services' control center in Swanwick, Hampshire. The spokesperson for Nats mentioned that the issue arose during a switch from night to daytime operations when communication systems failed to adjust, emphasizing the complexity of the system with over a million lines of software [55564]. Additionally, the disruption caused by the failure was acknowledged by officials as unintended, with efforts made to resolve the problem and minimize the impact on passengers and flights.
Intent (Poor/Accidental Decisions) unknown The software failure incident reported in Article 55564 was not explicitly attributed to poor decisions or accidental decisions. The incident was described as a technical problem with the ground communications system at the National Air Traffic Services' control center in Swanwick, Hampshire. The issue arose during a switch from night to daytime operations when communication systems failed to adjust, affecting air traffic control operations at major airports in the UK [55564].
Capability (Incompetence/Accidental) accidental (a) The software failure incident reported in the articles does not indicate any specific evidence of development incompetence as the cause of the glitch at the National Air Traffic Services' control center in Swanwick. The incident was described as a technical problem that arose during a switch from night to daytime operations, leading to communication systems failing to adjust properly. The complexity of the system with over a million lines of software was highlighted, emphasizing the challenges faced by the engineering team and the manufacturer in resolving the issue swiftly while maintaining a safe service [55564]. (b) The software failure incident appears to be accidental in nature, stemming from technical and operational issues during the transition between night and daytime operations at the control center. The spokesperson for Nats mentioned that the problem arose when communication systems failed to adjust, indicating an unintentional occurrence rather than a deliberate act of sabotage or malpractice. The incident was described as a major challenge for the engineering team and the manufacturer, suggesting that the failure was not intentional but rather a result of unforeseen technical difficulties [55564].
Duration temporary (a) The software failure incident in the articles was temporary. The technical problems that caused cancellations and delays at major UK airports were resolved within the same day. The glitch at the National Air Traffic Services' control center in Swanwick, Hampshire, was fixed, and operations were returning to normal shortly after 7.30 pm on the same day of the incident [Article 55564]. The disruption caused by the software failure was expected to continue as airports worked to clear the backlog, indicating a temporary nature of the incident.
Behaviour crash, omission, timing, value, other (a) crash: The software failure incident in the article can be categorized as a crash. The incident caused a day of cancellations and delays at major UK airports due to technical problems with the ground communications system at the National Air Traffic Services' control center in Swanwick, Hampshire. This led to a fifth of all flights being canceled and others facing delays of up to two hours [Article 55564]. (b) omission: The software failure incident can also be related to omission as the system omitted to perform its intended functions at an instance(s). The communication systems failed to adjust during the switch from night to daytime operations, leading to disruptions in air traffic control operations [Article 55564]. (c) timing: The timing of the software failure incident can be considered a factor in the disruption. Problems arose during the switch from night to daytime operations, indicating that the system was not able to adjust its functions correctly at the appropriate time, causing delays and cancellations [Article 55564]. (d) value: The software failure incident can be linked to a failure in value as the system was performing its intended functions incorrectly. The malfunction in the ground communications system at the control center led to disruptions in air traffic control operations, resulting in delays and cancellations of flights [Article 55564]. (e) byzantine: The software failure incident does not exhibit characteristics of a byzantine failure, which involves inconsistent responses and interactions. The incident described in the article primarily focused on technical problems with the communication system causing disruptions in air traffic control operations [Article 55564]. (f) other: The software failure incident can be categorized as a system failure due to a complex and sophisticated system with more than a million lines of software experiencing a glitch that impacted air traffic control operations at major UK airports [Article 55564].

IoT System Layer

Layer Option Rationale
Perception network_communication The software failure incident reported in the articles was related to the network_communication layer of the cyber physical system that failed. The failure was specifically attributed to a technical problem with the ground communications system at the National Air Traffic Services' control center in Swanwick, Hampshire, which deals with air traffic in the south of England [Article 55564]. The communication systems failed to adjust during the switch from night to daytime operations, leading to disruptions in air traffic control operations at various airports. The incident highlighted the critical role of communication systems in coordinating air traffic control activities and the impact of network communication errors on the overall system performance.
Communication link_level The software failure incident reported in Article 55564 was related to the communication layer of the cyber physical system that failed. The article mentions that the glitch at the National Air Traffic Services' control center in Swanwick, Hampshire, was caused by a problem with the ground communications system in the area control operations room. The communication systems failed to adjust during the switch from night to daytime operations, leading to delays and cancellations at major UK airports [55564]. The article highlights the complexity of the system, with more than a million lines of software, emphasizing that it is not just internal telephones but a system used by controllers to communicate with other air traffic control agencies in the UK and Europe. The failure in the communication system disrupted the normal operations of the air traffic control system, impacting thousands of passengers and leading to delays and cancellations at various airports [55564].
Application FALSE The software failure incident reported in the articles was not related to the application layer of the cyber physical system. The incident was specifically attributed to technical problems with the ground communications system at the National Air Traffic Services' control center in Swanwick, Hampshire, which deals with air traffic in the south of England. The issue was described as arising during a switch from night to daytime operations when communication systems failed to adjust, indicating a technical glitch rather than a failure at the application layer [Article 55564].

Other Details

Category Option Rationale
Consequence delay The consequence of the software failure incident described in the articles is primarily related to delays experienced by passengers at major UK airports. Thousands of passengers were affected by the glitch at the National Air Traffic Services' control center in Swanwick, resulting in a fifth of all flights being canceled at Heathrow, Stansted, and Gatwick airports, while others faced delays of up to two hours [Article 55564]. The delays caused frustration and inconvenience to passengers, with some experiencing significant wait times and disruptions to their travel plans. Additionally, incoming flights from various locations were delayed with no estimated time of arrival, further contributing to the overall impact on travelers [Article 55564].
Domain transportation, finance, government (a) The software failure incident reported in the articles affected the aviation industry, specifically the air traffic control system at the National Air Traffic Services' (Nats) control centre in Swanwick, Hampshire, which deals with air traffic in the south of England [Article 55564]. (h) The software failure incident also indirectly impacted the finance industry as airlines and airports had to deal with flight cancellations, delays, and disruptions, leading to financial losses and inconvenience for passengers [Article 55564]. (l) Additionally, the government sector was involved as the National Air Traffic Services (Nats) control centre is a critical part of the air traffic control system in the UK, which is a government-regulated industry [Article 55564].

Sources

Back to List