Incident: Multiple Technical Glitches Disrupt United Airlines, Wall Street Journal, and NYSE

Published Date: 2015-07-08

Postmortem Analysis
Timeline 1. The software failure incident happened on Wednesday, as mentioned in the article [37888]. 2. Published on 2015-07-08. 3. The incident occurred on July 2015.
System 1. United Airlines' network control system 2. The Wall Street Journal's homepage system 3. New York Stock Exchange's trading system [37888]
Responsible Organization 1. An "automation issue" at United Airlines 2. Internal technical issue at the New York Stock Exchange 3. Glitch affecting The Wall Street Journal's homepage [Cited from Article 37888]
Impacted Organization 1. United Airlines - experienced a mandatory delay for its planes affecting 4,900 flights worldwide [37888] 2. The Wall Street Journal - homepage went down briefly [37888] 3. New York Stock Exchange - suspended trading for nearly four hours due to an "internal technical issue" [37888]
Software Causes 1. An "automation issue" at United Airlines led to a mandatory delay for its planes, affecting 4,900 flights worldwide [37888]. 2. The New York Stock Exchange suspended trading for nearly four hours due to an "internal technical issue" [37888].
Non-software Causes 1. An "automation issue" at United Airlines led to a mandatory delay for its planes [37888]. 2. The New York Stock Exchange suspended trading for nearly four hours due to an "internal technical issue" [37888].
Impacts 1. An "automation issue" at United Airlines led to a mandatory delay for its planes lasting almost two hours, affecting 4,900 flights worldwide [37888]. 2. The Wall Street Journal's homepage went down briefly [37888]. 3. The New York Stock Exchange suspended trading for nearly four hours due to an "internal technical issue" [37888].
Preventions 1. Implementing redundancies and fail-safes in the systems to mitigate potential issues [37888]. 2. Conducting thorough assessments of cyber-risk profiles to identify and address vulnerabilities [37888]. 3. Ensuring diligent oversight and monitoring of the systems to catch and rectify any technical problems or errors [37888].
Fixes 1. Implement redundancies and fail-safes in the system to mitigate potential issues [37888]. 2. Conduct thorough assessments of cyber-risk profiles to identify and address vulnerabilities [37888]. 3. Ensure diligent oversight and monitoring of the system to catch and address any technical problems promptly [37888].
References 1. United Airlines 2. The Wall Street Journal 3. New York Stock Exchange 4. Steve Grobman, Chief Technology Officer at Intel Security 5. Electrical grid operator (mentioned in a separate incident) 6. Various companies affected by the glitches on Wednesday (not specifically named) 7. Technical infrastructures underlying modern networks

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident having happened again at one_organization: The article mentions that United Airlines experienced an "automation issue" that led to a mandatory delay for its planes, affecting 4,900 flights around the world. This incident could be considered a recurrence of a software failure within the same organization [37888]. (b) The software failure incident having happened again at multiple_organization: The article reports that United Airlines, the Wall Street Journal, and the New York Stock Exchange all faced technical issues on the same day. United Airlines had an automation issue, the Wall Street Journal's homepage went down briefly, and the New York Stock Exchange suspended trading due to an internal technical issue. These incidents occurring at multiple organizations on the same day highlight the widespread impact of software failures [37888].
Phase (Design/Operation) design, operation (a) The article mentions incidents at United Airlines, the Wall Street Journal, and the New York Stock Exchange that were caused by an "automation issue," "internal technical issue," and "technical hiccups." These incidents were attributed to technical hiccups rather than cyber attacks, indicating that the failures were likely due to contributing factors introduced during system development or updates [37888]. (b) The article also highlights the importance of having redundancies, fail-safes, and good assessments of cyber-risk profiles to mitigate technical issues. It mentions instances where small human errors or overlooked procedures, like forgetting to turn on a power monitoring program after a software upgrade, can lead to system failures. This suggests that failures can also occur due to contributing factors introduced during the operation or misuse of the system [37888].
Boundary (Internal/External) within_system, outside_system The software failure incidents reported in the articles can be attributed to both within_system and outside_system factors. 1. Within_system: The incidents mentioned in the articles were primarily caused by internal technical issues within the systems of the affected organizations. For example, United Airlines experienced an "automation issue," the New York Stock Exchange faced an "internal technical issue," and the Wall Street Journal's homepage went down briefly due to a glitch. These issues were not attributed to external cyber attacks but rather technical hiccups within the systems themselves [37888]. 2. Outside_system: While the incidents were mainly internal technical issues, the articles also highlight the interconnected nature of modern networks as a contributing factor to the software failures. The centralized and automated systems can make it challenging to pinpoint the exact cause of a problem or prevent cascading failures. This interconnectedness exposes the systems to external risks and vulnerabilities, even if the initial cause of the failure originates from within the system [37888].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incidents mentioned in the articles were primarily attributed to non-human actions such as technical glitches, automation issues, and internal technical problems. For example, the incidents at United Airlines, the Wall Street Journal, and the New York Stock Exchange were described as "technical hiccups" and "internal technical issue" [37888]. (b) While the articles did not provide specific examples of software failures caused by human actions, they did mention the possibility of human errors contributing to system failures. The article highlighted a case where a small human error, like forgetting to turn on a power monitoring program after a software upgrade, led to a massive blackout in the Northeast in 2003. This suggests that human actions can also play a role in software failure incidents [37888].
Dimension (Hardware/Software) hardware, software (a) The software failure incident related to hardware: The article mentions an incident where an electrical grid operator forgot to turn a power monitoring program on after upgrading its software in 2003, contributing to a massive blackout in the Northeast. This incident highlights how a human error related to hardware (the power monitoring program) contributed to a significant failure in the system [37888]. (b) The software failure incident related to software: The article discusses an "automation issue" at United Airlines, a brief downtime of The Wall Street Journal's homepage, and the New York Stock Exchange suspending trading due to an "internal technical issue." These incidents point to software-related failures that caused disruptions in critical systems, emphasizing the impact of software glitches on operations [37888].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incidents mentioned in the articles were non-malicious in nature. The incidents at United Airlines, the Wall Street Journal, and the New York Stock Exchange were attributed to technical glitches and automation issues rather than cyber attacks or intentional malicious actions [37888]. The Chief Technology Officer at Intel Security, Steve Grobman, emphasized the challenges that come with modern technology and the need for redundancies and fail-safes to mitigate technical problems, including those caused by human errors [37888].
Intent (Poor/Accidental Decisions) accidental_decisions (a) The articles mention instances of software failure incidents that were not caused by cyber attacks but rather by technical hiccups or automation issues. These incidents were described as glitches or technical difficulties that led to disruptions in critical systems at major organizations like United Airlines, the Wall Street Journal, and the New York Stock Exchange [37888]. (b) The articles also highlight the possibility of software failures being caused by small human errors that can ripple through a system, such as the example of an electrical grid operator forgetting to turn on a power monitoring program after a software upgrade in 2003, contributing to a massive blackout in the Northeast. This indicates that software failures can sometimes be the result of accidental decisions or mistakes made by individuals within the organization [37888].
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The article mentions instances where technical issues were caused by human errors or inherent flaws in software. For example, it talks about a human error where an electrical grid operator forgot to turn on a power monitoring program after a software upgrade in 2003, contributing to a massive blackout in the Northeast. This highlights the impact of development incompetence or lack of professional competence on software failures [37888]. (b) The article also discusses how sometimes software failures can be caused by small human errors that ripple through a system, such as the example of the electrical grid operator's mistake. These accidental errors can lead to system disruptions and failures, emphasizing the role of accidental factors in software failures [37888].
Duration temporary The software failure incidents mentioned in the article were temporary in nature. The incidents at United Airlines, the Wall Street Journal, and the New York Stock Exchange were described as glitches, automation issues, and internal technical issues, respectively. These incidents caused delays and suspensions but were eventually resolved, indicating that the failures were temporary and not permanent [37888].
Behaviour crash, omission, timing, other (a) crash: The incident at United Airlines, the Wall Street Journal, and the New York Stock Exchange all experienced disruptions due to technical glitches, which can be considered a form of a crash where the systems lost their state and were unable to perform their intended functions [37888]. (b) omission: The article mentions a case in 2003 where a massive blackout in the Northeast was contributed to by a small human error - the electrical grid operator forgot to turn a power monitoring program on after upgrading its software. This omission led to a significant failure in the system [37888]. (c) timing: The incidents mentioned in the article, such as the delay in United Airlines flights and the suspension of trading at the New York Stock Exchange, were due to technical glitches causing delays in the systems' operations. This delay in performing the intended functions can be categorized as a timing issue [37888]. (d) value: The article does not specifically mention any failures related to the system performing its intended functions incorrectly, which would fall under the category of a value-related failure. Therefore, there is no direct information provided in the article about this type of behavior. (e) byzantine: The article does not mention any instances of the system behaving erroneously with inconsistent responses and interactions, which would be classified as a byzantine failure. Therefore, there is no direct information provided in the article about this type of behavior. (f) other: The article discusses technical hiccups, automation issues, and internal technical problems that led to disruptions in critical systems. These incidents can be considered as other forms of software failure behavior that do not fit into the specific categories mentioned (a to e) [37888].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property, delay, non-human The consequence of the software failure incident described in the articles was primarily related to delays in various services and operations. Specifically: - United Airlines experienced a mandatory delay for its planes lasting almost two hours, affecting 4,900 flights globally [37888]. - The Wall Street Journal's homepage went down briefly [37888]. - The New York Stock Exchange suspended trading for nearly four hours due to an "internal technical issue" [37888]. These incidents caused significant disruptions and delays in air travel, online news access, and stock trading operations, highlighting the impact of software failures on various services and industries.
Domain information, transportation, finance (a) The Wall Street Journal's homepage went down briefly due to an "automation issue" at United Airlines, affecting the production and distribution of information [37888]. (b) An "automation issue" at United Airlines led to a mandatory delay for its planes, impacting transportation [37888]. (h) The New York Stock Exchange suspended trading for nearly four hours due to an "internal technical issue," affecting the finance industry [37888].

Sources

Back to List