Incident: Air Traffic Control Software Upgrade Causes Flight Delays and Cancellations

Published Date: 2015-08-17

Postmortem Analysis
Timeline 1. The software failure incident of flight delays and cancellations across the east coast of the US due to a software upgrade happened on a weekend as per the article [38696]. 2. The article [38696] was published on 2015-08-17. 3. Estimation: The incident occurred on the weekend before the publication date, which would be around August 15-16, 2015. Therefore, the software failure incident happened around August 15-16, 2015.
System 1. En Route Automation Modernisation (Eram) computer system at the Leesburg Center [38696]
Responsible Organization 1. The US Federal Aviation Administration (FAA) was responsible for causing the software failure incident due to a software upgrade that led to flight delays and cancellations [38696].
Impacted Organization 1. Flights across the east coast of the US, including airports like Baltimore-Washington International Airport, Ronald Reagan Washington National Airport, and Dulles International Airport, were impacted by the software failure incident [38696].
Software Causes 1. The software causes of the failure incident were related to technical issues with the En Route Automation Modernisation computer system (Eram) at the Leesburg Center due to a software upgrade [38696].
Non-software Causes 1. Reduction in arrival and departure rates in the Washington area for safety reasons [38696].
Impacts 1. 492 flight delays and 476 cancellations were caused by the software upgrade issue, affecting air traffic control across the east coast of the US [38696]. 2. Arrival and departure rates in the Washington area were reduced between 11am and 4pm on Saturday for safety reasons, leading to significant disruptions for flights from New Jersey and New York [38696]. 3. By mid-afternoon, cancellation rates were high with 50% of inbound flights and 42% of outbound flights canceled at Reagan National, and 58% of inbound flights and 36% of outbound flights canceled in Baltimore [38696]. 4. Delays were averaging about three hours at Reagan National and more than an hour in Baltimore due to the software failure incident [38696].
Preventions 1. Proper testing and validation of the software upgrade before implementation could have potentially prevented the software failure incident [38696]. 2. Implementing a more gradual rollout of the software upgrade to minimize the impact on air traffic control operations and allow for smoother transition and troubleshooting [38696]. 3. Ensuring robust contingency plans and backup systems in place to quickly address and mitigate any technical issues that may arise during software upgrades or system changes [38696].
Fixes 1. Conduct a thorough assessment of the malfunction in the En Route Automation Modernisation computer system (Eram) at the Leesburg Center to identify the root cause and implement necessary fixes [38696]. 2. Ensure that the new features of the software upgrade are properly tested and verified before re-enabling them to prevent similar incidents in the future [38696]. 3. Implement robust testing procedures for software upgrades to minimize the risk of disruptions to air traffic control operations [38696].
References 1. US Federal Aviation Administration (FAA) [38696] 2. FlightRadar24 [38696]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident related to the FAA's En Route Automation Modernisation computer system (Eram) occurred again within the same organization. The article mentions that the software upgrade causing the flight delays and cancellations was performed on the Eram computer system [38696]. This indicates that the incident was related to the FAA's own system. (b) There is no information in the provided article about a similar incident happening at other organizations or with their products and services.
Phase (Design/Operation) design (a) The software failure incident in the news article was related to the design phase. The incident was caused by a software upgrade that led to technical issues with the En Route Automation Modernisation computer system (Eram) at the Leesburg Center. The upgrade was intended to provide more tools for controllers but resulted in flight delays and cancellations. The new features of the upgrade had to be disabled while the systems contractor assessed the malfunction [38696]. (b) The software failure incident in the news article was not directly related to the operation phase or misuse of the system. The primary cause of the disruption was attributed to technical issues arising from the software upgrade affecting the air traffic control system, rather than operational errors or misuse of the system by operators or users [38696].
Boundary (Internal/External) within_system (a) The software failure incident related to the flight delays and cancellations in the US was primarily within the system. The issue was attributed to a software upgrade at the Virginia air traffic control center, specifically with the En Route Automation Modernisation computer system (Eram) at the Leesburg Center. The FAA mentioned that the upgrade was designed to provide more tools for controllers but had to be disabled due to technical issues. The system contractor was conducting an assessment of the malfunction, indicating an internal system issue [38696]. (b) There is no specific information in the provided articles indicating that the software failure incident was caused by contributing factors originating from outside the system.
Nature (Human/Non-human) non-human_actions (a) The software failure incident was primarily due to non-human actions, specifically a software upgrade that caused technical issues with the En Route Automation Modernisation computer system at the Virginia air traffic control center [38696]. The FAA mentioned that the new features from the upgrade were disabled while the systems contractor assessed the malfunction, indicating that the failure was not directly caused by human actions but rather by the software upgrade itself. (b) There is no specific mention in the provided article about the software failure incident being caused by human actions.
Dimension (Hardware/Software) hardware, software (a) The software failure incident reported in Article 38696 was due to hardware-related issues. The Federal Aviation Administration mentioned that the technical issues were with the En Route Automation Modernisation computer system (Eram) at the Leesburg Center, indicating a hardware-related problem [38696]. (b) The software failure incident was also related to software issues. The FAA confirmed that the software upgrade causing the problems was performed on the Eram computer system, indicating a software-related aspect of the failure [38696].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident described in the articles does not indicate any malicious intent. The incident was attributed to technical issues with a software upgrade at a Virginia air traffic control center, specifically related to the En Route Automation Modernisation (Eram) computer system. The FAA mentioned that the upgrade was designed to provide more tools for controllers, and the new features had to be disabled while the systems contractor assessed the malfunction. There was no indication that the problem was related to any inherent problems with the Eram system, which had a high availability rate prior to the incident [38696]. (b) The software failure incident was categorized as non-malicious, stemming from technical issues during a software upgrade rather than any deliberate attempt to harm the system.
Intent (Poor/Accidental Decisions) unknown (a) The software failure incident related to the flight delays and cancellations across the east coast of the US was not primarily due to poor decisions but rather technical issues with a software upgrade. The Federal Aviation Administration mentioned that the upgrade was designed to provide more tools for controllers, but the new features had to be disabled while the systems contractor completed an assessment of the malfunction [38696]. The decision to reduce arrival and departure rates in the Washington area for safety reasons was a precautionary measure taken as part of the response to the technical issues, rather than a poor decision [38696]. (b) The software failure incident was not caused by accidental decisions or mistakes. It was primarily attributed to technical issues with the En Route Automation Modernisation computer system (Eram) at the Leesburg Center, where a software upgrade was performed. The FAA confirmed that the problem was related to the software upgrade on the Eram system, but there was no indication that the issue was due to inherent problems with the En Route Automation Modernisation system itself [38696].
Capability (Incompetence/Accidental) accidental (a) The software failure incident mentioned in the articles seems to be more related to accidental factors rather than development incompetence. The issue was attributed to technical issues with a software upgrade at a Virginia air traffic control center, causing flight delays and cancellations across the east coast of the US [38696]. The FAA mentioned that the upgrade was designed to provide more tools for controllers, but the new features had to be disabled while the systems contractor completed an assessment of the malfunction. The agency also confirmed that the software upgrade causing the issues was performed on the Eram computer system, but there was no indication that the problem was related to any inherent problems with the En Route Automation Modernisation system [38696]. (b) The software failure incident does not seem to be directly linked to development incompetence. The issues were more accidental in nature, stemming from technical issues with the software upgrade at the air traffic control center rather than incompetence in the development process.
Duration temporary The software failure incident reported in Article 38696 was temporary. The article mentions that the technical issues with the software upgrade caused flight delays and cancellations over the weekend, but the problems were resolved by approximately 4 pm on the same day. The FAA confirmed that the new features of the upgrade were disabled while the systems contractor assessed the malfunction, indicating a temporary disruption rather than a permanent one [38696].
Behaviour omission, value, other (a) crash: The software failure incident in the article resulted in flight delays and cancellations across the east coast of the US due to a software upgrade. The system experienced "technical issues" at a Virginia air traffic control center, leading to 492 flight delays and 476 cancellations [38696]. (b) omission: The software failure incident caused disruptions at airports, with flights being delayed or cancelled due to the system not performing its intended functions as expected. The FAA mentioned that the new features of the software upgrade had to be disabled while the systems contractor assessed the malfunction [38696]. (c) timing: The article mentions that part of the backlog was due to a decision to reduce arrival and departure rates in the Washington area between 11 am and 4 pm on Saturday for safety reasons. This timing issue affected flights from New Jersey and New York that flew over the Washington area [38696]. (d) value: The software failure incident led to delays and cancellations of flights, indicating that the system was not performing its intended functions correctly. Delays were reported at various airports, with significant percentages of inbound and outbound flights being canceled or delayed [38696]. (e) byzantine: The article does not provide information suggesting that the software failure incident exhibited behaviors of inconsistency or erratic responses. (f) other: The software failure incident in the article also mentioned that the En Route Automation Modernisation computer system (Eram) at the Leesburg Center was identified as the source of the technical issues. The FAA confirmed that the software upgrade causing the problems was performed on the Eram computer system, but there was no indication of inherent problems with the En Route Automation Modernisation system itself [38696].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence delay (e) delay: People had to postpone an activity due to the software failure. The software failure incident caused 492 flight delays and 476 cancellations across the east coast of the US, impacting various airports such as Baltimore-Washington International Airport, Ronald Reagan Washington National Airport, and Dulles International Airport [38696]. Flight delays were significant, with delays averaging about three hours at Reagan National and more than an hour at Baltimore [38696].
Domain transportation (a) The failed system was intended to support the transportation industry. The software failure incident affected air traffic control centers and led to flight delays and cancellations across the east coast of the US [38696].

Sources

Back to List