Incident: Northeast U.S. Internet Outage Impacts Millions, Verizon Investigating.

Published Date: 2021-01-26

Postmortem Analysis
Timeline 1. The software failure incident happened on January 26, 2021 [110213].
System 1. Verizon Fios internet service [110213]
Responsible Organization 1. Verizon [110213]
Impacted Organization 1. Internet users across the northeast U.S., including Fios internet customers [110213] 2. Fairfax County Public Schools in the Washington, D.C., suburbs [110213] 3. Galvin Middle School in Wakefield, Massachusetts [110213] 4. Internet and cloud providers, major sites such as Google and Facebook, Amazon Web Services customers [110213]
Software Causes 1. Unknown
Non-software Causes 1. Cut fiber in Brooklyn, New York [110213]
Impacts 1. Internet users across the northeast U.S. experienced widespread outages, interrupting work and school activities [110213]. 2. People in the region from Washington, D.C., to Boston reported issues connecting with various online services, affecting government services, financial companies, and remote learning [110213]. 3. Students and teachers had to find workarounds, such as switching to another platform, doing independent learning, or resorting to pen-and-paper assignments [110213]. 4. Major sites like Google and Facebook, as well as internet and cloud providers, were affected by the outage [110213]. 5. Connectivity issues for Amazon Web Services customers were resolved after an hour and a half [110213].
Preventions 1. Implementing robust redundancy and failover systems in the network infrastructure to mitigate the impact of network issues like the one experienced by Verizon [110213]. 2. Conducting regular network monitoring and maintenance to proactively identify and address potential issues before they escalate into widespread outages [110213]. 3. Enhancing communication and transparency with customers by providing timely updates and information during service interruptions to manage expectations and minimize inconvenience [110213].
Fixes 1. Investigating the root cause of the issue within the Verizon network to prevent similar incidents in the future [110213]. 2. Implementing redundancy and failover mechanisms in the network infrastructure to minimize the impact of outages [110213]. 3. Enhancing communication and coordination between internet service providers to address network issues promptly and efficiently [110213].
References 1. Spokesman Rich Young from Verizon [110213] 2. People posting on Twitter [110213] 3. Diana Gaspar [110213] 4. Fairfax County Public Schools spokeswoman Lucy Caldwell [110213] 5. Fairfax parent Tracy Compton [110213] 6. Galvin Middle School representative Trish Dellanno [110213] 7. Doug Madory, director of internet analysis at Kentik [110213] 8. Cary Wiedemann, a network engineer [110213]

Software Taxonomy of Faults

Category Option Rationale
Recurring multiple_organization (a) The software failure incident related to one_organization: - The article does not mention any previous incidents of a similar nature happening again within the same organization (Verizon) or with its products and services. Therefore, there is no indication of this specific incident happening again at Verizon or with its services [110213]. (b) The software failure incident related to multiple_organization: - The article mentions that the outage affected internet and cloud providers as well as major sites such as Google and Facebook. Amazon, Google, and other providers were impacted by the East Coast outages. This indicates that similar incidents have happened again at multiple organizations or with their products and services, not just Verizon [110213].
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase can be seen in the article where it mentions that internet users across the northeast U.S. experienced widespread outages due to an unspecified Verizon network issue. The article highlights that the outage affected internet and cloud providers as well as major sites such as Google and Facebook, indicating a failure related to the design or development of the network infrastructure [110213]. (b) The software failure incident related to the operation phase is evident in the article where it describes how people in the affected region had issues connecting with various online services, impacting work and school activities. Users had to find workarounds, such as switching to another instruction platform or resorting to pen-and-paper assignments when facing internet problems, showcasing failures introduced by the operation or use of the system [110213].
Boundary (Internal/External) within_system (a) within_system: The software failure incident in this case was within the system, specifically related to an unspecified Verizon network issue affecting Fios internet service in the northeast U.S. [110213]. The outage impacted internet and cloud providers, major sites like Google and Facebook, and caused disruptions for millions of Fios internet customers. The issue originated from within the Verizon network infrastructure, leading to service interruptions for users within the system.
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in this case was not directly attributed to non-human actions. The outage was primarily caused by an unspecified Verizon network issue, which indicates a technical problem within the network infrastructure [110213]. (b) Human actions played a role in mitigating the impact of the software failure incident. For example, teachers and students found workarounds when facing connectivity issues, such as switching to another instruction platform or resorting to pen-and-paper assignments [110213]. Additionally, network engineer Cary Wiedemann highlighted the complexity of identifying the source of disruptions, pointing out that even if home internet worked, certain online services could still be affected due to issues with the backbone of Verizon's network [110213].
Dimension (Hardware/Software) software (a) The software failure incident reported in the articles was not attributed to hardware issues. The outage experienced by internet users across the northeast U.S. was specifically mentioned to be due to an unspecified Verizon network issue [110213]. (b) The software failure incident was attributed to contributing factors that originated in software, specifically related to the Verizon network issue that caused widespread outages for several hours. The article mentions that the service interruptions were unrelated to a cut fiber in Brooklyn, New York, indicating that the root cause was not hardware-related but rather a software issue within the Verizon network [110213].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident reported in the article does not indicate any malicious intent. It was a non-malicious failure caused by an unspecified Verizon network issue that led to widespread outages affecting internet users across the northeast U.S. [110213]. The outage impacted various online services, including major sites like Google and Facebook, as well as cloud providers. The incident disrupted work and school activities for many individuals in the region, highlighting the significant impact of such non-malicious failures on daily life and remote learning during the pandemic.
Intent (Poor/Accidental Decisions) accidental_decisions (a) The software failure incident reported in the article does not indicate any poor decisions as the cause of the failure. The incident was primarily attributed to an unspecified Verizon network issue that led to widespread outages affecting internet services in the northeast U.S. [110213]. (b) The software failure incident seems to be more aligned with accidental decisions or mistakes rather than poor decisions. The outage was described as an internet issue impacting the quality of Verizon's Fios service, and the company was investigating what happened to resolve the problem. The disruptions were not intentional but rather unintended consequences of the network issue. Additionally, the incident affected various online services and major sites like Google and Facebook, indicating a broader technical issue rather than a specific poor decision. [110213].
Capability (Incompetence/Accidental) accidental (a) The software failure incident in the article was not attributed to development incompetence. The outage experienced by internet users in the northeast U.S. was primarily due to an unspecified Verizon network issue, which the company was investigating [110213]. (b) The software failure incident in the article was accidental in nature. The widespread outages experienced by internet users across the northeast U.S. were caused by an unspecified Verizon network issue, which was not intentional but rather an accidental occurrence [110213].
Duration temporary (a) The software failure incident in this case was temporary. Internet users across the northeast U.S. experienced widespread outages for several hours on Tuesday, but the issue was eventually resolved, and service levels were returning to normal [110213]. The outage began at 11:25 a.m. local time and recovery began at 12:37 p.m., indicating that the incident was not permanent but rather a temporary disruption in service [110213].
Behaviour crash, omission, other (a) crash: The software failure incident in the article can be categorized as a crash. Internet users across the northeast U.S. experienced widespread outages for several hours, interrupting work and school due to an unspecified Verizon network issue. This outage led to a loss of service for many customers, indicating a failure of the system to perform its intended functions [110213]. (b) omission: The incident also involved omission as a behavior of the software failure. Users reported issues connecting with various online services in the region, causing disruptions to work and school activities. For example, a daughter in New York couldn't connect to her online classroom due to spotty home internet, leading to an omission of the intended function of providing stable internet connectivity [110213]. (c) timing: Timing can be considered a factor in this software failure incident. The outage occurred during a critical time when millions of people were working from home and students were attending school remotely due to the pandemic. The timing of the failure exacerbated the impact on users who relied on internet services for work and education [110213]. (d) value: The software failure incident did not specifically involve a failure due to the system performing its intended functions incorrectly. The focus of the incident was on the loss of service and connectivity issues rather than the system providing incorrect information or results [110213]. (e) byzantine: The software failure incident did not exhibit behavior characteristic of a byzantine failure, which involves inconsistent responses and interactions. The outage experienced by users was more of a widespread loss of service rather than erratic or inconsistent behavior from the system [110213]. (f) other: The other behavior exhibited in this software failure incident could be classified as a network issue leading to a service disruption. The failure was related to a Verizon network problem that caused interruptions in internet services for users across the northeast U.S. This network issue resulted in a significant impact on various online services and users' ability to connect, highlighting the broader implications of network failures [110213].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property, delay, non-human (d) property: People's material goods, money, or data was impacted due to the software failure. The software failure incident related to the Verizon network issue caused widespread outages for internet users across the northeast U.S. This interruption affected work and school activities due to connectivity issues. People reported having problems connecting with various online services, including online classrooms, financial services, and government services. The outage also impacted internet and cloud providers, as well as major sites like Google and Facebook. Amazon's web services division experienced connectivity issues, and there was a 12% drop in traffic volume to Verizon during the incident [110213].
Domain information, finance, government (a) The software failure incident affected internet services, including major online platforms like Google and Facebook, which are crucial for the production and distribution of information [110213]. (h) The incident impacted major financial companies like Fidelity Investments, indicating a connection to the finance industry [110213]. (l) The outage disrupted key U.S. government services, such as the Fairfax County Public Schools in the Washington, D.C. suburbs, highlighting the impact on the government sector [110213].

Sources

Back to List