Incident: AT&T Network Outage in Nashville Due to Christmas Day Bombing

Published Date: 2020-12-31

Postmortem Analysis
Timeline 1. The software failure incident happened on Christmas Day, as mentioned in the article [108599]. 2. The article was published on 2020-12-31. 3. Therefore, the software failure incident occurred on December 25, 2020.
System unknown
Responsible Organization 1. The Christmas Day bombing in downtown Nashville caused by the bomber led to the software failure incident by damaging the key AT&T network facility, resulting in phone and data service outages and disruptions over hundreds of miles in the southern U.S. [108599]
Impacted Organization 1. AT&T customers in large parts of Tennessee, Kentucky, and Alabama, including 911 centers, hospitals, stores, and the Nashville police department [108599].
Software Causes 1. unknown
Non-software Causes 1. The Christmas Day bombing in downtown Nashville seriously damaged a key AT&T network facility, leading to phone and data service outages and disruptions over hundreds of miles in the southern U.S. [108599]. 2. Backup generators at the AT&T facility went down after the blast, causing service outages. A fire broke out, and the building flooded, with more than three feet of water pumped out of the basement [108599]. 3. The blast damaged dozens of buildings, injured several people, and killed the bomber. Federal officials are investigating the motive behind the explosion [108599]. 4. The physical damage to the AT&T building, including flooding and fire, complicated the restoration process. Some equipment had to be fixed in a building that was part of an active crime scene, hindering AT&T workers' access [108599]. 5. The blast disrupted not only AT&T services but also impacted emergency services, with roughly a hundred 911 centers experiencing service problems in Tennessee alone [108599].
Impacts 1. Phone and data service outages and disruptions over hundreds of miles in the southern U.S., affecting AT&T customers across large parts of Tennessee, Kentucky, and Alabama, leading to loss of phones, internet, and video services [108599]. 2. 911 centers in the region couldn't take calls, and some didn't receive crucial data associated with callers, such as their locations [108599]. 3. Nashville police department's phones and internet failed [108599]. 4. Some hospitals experienced disruptions in electronic medical records, internet service, and phones [108599]. 5. Nashville airport halted flights for about three hours on Christmas [108599]. 6. Rival carrier T-Mobile also had service issues as far away as Atlanta, 250 miles away, due to using AT&T equipment for moving customer data [108599]. 7. The Parthenon museum replica located near the explosion site didn't have a working phone four days after the blast [108599].
Preventions 1. Implementing redundant backup systems for critical network facilities like the AT&T building in Nashville could have prevented the software failure incident [108599]. 2. Conducting regular maintenance and testing of backup generators to ensure they are operational in case of emergencies could have helped prevent the outage [108599]. 3. Enhancing physical security measures at key network facilities to prevent unauthorized access or attacks that could lead to service disruptions [108599]. 4. Improving network resilience by avoiding single points of failure and ensuring redundancy in critical infrastructure components [108599].
Fixes 1. Enhancing redundancy and reliability in the telecommunication infrastructure to prevent widespread outages in case of incidents like the Nashville bombing [108599]. 2. Implementing measures to avoid single points of failure in the network to ensure resilience against various threats such as physical damage, human error, hostile actions, or software bugs [108599].
References 1. Doug Schmidt, a Vanderbilt University computer science professor 2. Rep. Jim Cooper, Democrat representing Nashville in Congress 3. Brian Fontes, head of the National Emergency Number Association 4. David Turetsky, a lecturer at the University at Albany and a former public safety official at the Federal Communications Commission 5. Kristin Mumford, spokesperson for the Nashville police department 6. John Holmes, assistant director of Metro Parks

Software Taxonomy of Faults

Category Option Rationale
Recurring unknown The articles do not provide specific information about a software failure incident happening again at the same organization (one_organization) or at multiple organizations (multiple_organization). Therefore, the information to answer this question is 'unknown'.
Phase (Design/Operation) unknown The articles do not provide specific information about the software failure incident related to the development phases, whether it was due to design factors introduced during system development or operation factors introduced during system operation or misuse. Therefore, it is unknown which specific phase contributed to the software failure incident in this case.
Boundary (Internal/External) within_system, outside_system (a) within_system: The software failure incident in the Nashville bombing incident was primarily within the system. The failure was caused by the blast seriously damaging a key AT&T network facility, leading to backup generators going down, a fire breaking out, and the building flooding, which resulted in service outages for AT&T customers across large parts of Tennessee, Kentucky, and Alabama [108599]. (b) outside_system: The software failure incident in the Nashville bombing incident was also influenced by factors outside the system. The bombing itself, which was an external event, led to the serious damage to the AT&T network facility, causing the software failure incident and subsequent service outages [108599].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident occurring due to non-human actions: The software failure incident in this case was primarily caused by a bombing that led to serious damage to a key AT&T network facility in downtown Nashville. The blast damaged the facility, causing backup generators to go down, a fire to break out, and the building to flood with water, leading to service outages across large parts of Tennessee, Kentucky, and Alabama [108599]. (b) The software failure incident occurring due to human actions: While the initial cause of the software failure incident was the bombing, the subsequent restoration efforts and challenges involved human actions. AT&T worked to restore services by sending temporary cell towers, rerouting traffic, and activating remaining wireline equipment. However, access to the building for repairs was complicated due to it being part of an active crime scene, which hindered the restoration process [108599].
Dimension (Hardware/Software) hardware, software (a) The software failure incident related to hardware: - The Christmas Day bombing in downtown Nashville seriously damaged a key AT&T network facility, leading to phone and data service outages and disruptions over hundreds of miles in the southern U.S. The blast damaged the facility, causing backup generators to go down, a fire to break out, and the building to flood with more than three feet of water [108599]. - AT&T had to send temporary cell towers to help in affected areas and rerouted traffic to other facilities as it worked to restore power to the Nashville building. However, not all traffic could be rerouted, and there was physical equipment that had to be fixed in a building that was part of an active crime scene, complicating access for AT&T workers [108599]. (b) The software failure incident related to software: - The Nashville police department, which uses the FirstNet system built by AT&T, had to turn to a backup provider, CenturyLink, for its landlines and internet at headquarters and precincts due to service problems with AT&T. The department also obtained loaner cellphones and mobile hotspots from Verizon [108599]. - A December 2018 CenturyLink outage, caused by software bugs and equipment failures, lasted for more than a day and disrupted 911 calls in over two dozen states, affecting millions of people. This outage included blocked calls for Verizon customers and busy signals for Comcast customers, both of which used CenturyLink’s network [108599].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident in this case was not malicious. It was caused by a bombing in downtown Nashville that seriously damaged a key AT&T network facility, leading to phone and data service outages and disruptions over hundreds of miles in the southern U.S. The blast damaged the facility, causing backup generators to go down, a fire to break out, and the building to flood, impacting AT&T customers and various services across Tennessee, Kentucky, and Alabama [Article 108599]. (b) The software failure incident was non-malicious, resulting from physical damage and infrastructure issues caused by the bombing rather than any intentional actions to harm the system. The incident highlighted the vulnerability of U.S. communications infrastructure to physical damage and the importance of redundancy and resilience in telecom facilities to prevent widespread service disruptions [Article 108599].
Intent (Poor/Accidental Decisions) unknown The software failure incident reported in the articles does not directly point to either poor_decisions or accidental_decisions as the intent behind the failure. The incident was primarily caused by physical damage to a key AT&T network facility due to a bombing, leading to widespread service outages and disruptions in the southern U.S. The failure was a result of the blast damaging infrastructure, backup generators going down, fires breaking out, and the building flooding, rather than being attributed to specific poor or accidental decisions related to software.
Capability (Incompetence/Accidental) accidental (a) The software failure incident related to development incompetence is not explicitly mentioned in the provided article [108599]. (b) The software failure incident was accidental, as it was caused by a bombing in downtown Nashville that led to phone and data service outages and disruptions over hundreds of miles in the southern U.S. The blast seriously damaged a key AT&T network facility, leading to widespread service outages and disruptions [108599].
Duration temporary The software failure incident described in the articles is more aligned with a temporary failure rather than a permanent one. The outage and disruptions in phone and data services were a result of the blast damaging a key AT&T network facility, leading to backup generators going down, a fire breaking out, and the building flooding [Article 108599]. The article mentions that AT&T was able to restore 96% of its wireless network by Sunday and nearly all services were back up by Monday evening, with the activation of the last remaining wireline equipment on Wednesday. Additionally, temporary measures such as sending out temporary cell towers and rerouting traffic to other facilities were implemented to mitigate the impact of the incident [Article 108599].
Behaviour crash, omission, value, other (a) crash: The software failure incident in this case can be categorized as a crash. The blast in downtown Nashville seriously damaged a key AT&T network facility, leading to the loss of phone, internet, and video services across large parts of Tennessee, Kentucky, and Alabama [108599]. (b) omission: The software failure incident also involved omission as a behavior. Due to the blast and subsequent damage to the AT&T facility, 911 centers in the region couldn't take calls, and crucial data associated with callers, such as their locations, were not received. Additionally, the Nashville police department's phones and internet failed, causing disruptions in emergency services [108599]. (d) value: There were instances of the software failure incident related to performing its intended functions incorrectly. For example, at some hospitals, electronic medical records, internet service, or phones stopped working, impacting the delivery of healthcare services [108599]. (f) other: The software failure incident also exhibited other behaviors not covered by the options listed. For instance, the incident involved physical vulnerabilities of the communications networks, which were exploited by the blast in downtown Nashville. The physical damage to the AT&T facility, including flooding and fire, contributed to the failure of services [108599].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence death, harm, property, delay, non-human (a) death: The Christmas Day bombing in downtown Nashville led to the death of the bomber, who was the only fatality reported in the incident [Article 108599]. (b) harm: Several people were injured as a result of the explosion in downtown Nashville [Article 108599]. (d) property: The blast seriously damaged a key AT&T network facility, leading to disruptions in phone and data services over hundreds of miles in the southern U.S. Additionally, dozens of buildings were damaged in the explosion [Article 108599]. (e) delay: The Nashville airport had to halt flights for about three hours on Christmas Day due to the software failure incident [Article 108599]. (f) non-human: The blast and subsequent software failure impacted non-human entities such as the AT&T network facility, which suffered physical damage, flooding, and equipment failures [Article 108599].
Domain information, utilities, government (a) The failed system was intended to support the information industry, specifically telecommunications services provided by AT&T. The software failure incident resulted in phone and data service outages over hundreds of miles in the southern U.S., impacting AT&T customers across large parts of Tennessee, Kentucky, and Alabama [Article 108599]. (g) The failed system also affected the utilities industry, as the AT&T network facility that was seriously damaged in the Christmas Day bombing in Nashville provides local wireless, internet, and video services, connecting to regional networks. The outage led to disruptions in services such as phones, internet, and video, impacting not only individual customers but also critical infrastructure like 911 centers, hospitals, and the Nashville airport [Article 108599]. (l) Additionally, the government sector was impacted by the software failure incident, as the Nashville police department's phones and internet failed, and roughly a hundred 911 centers in Tennessee alone experienced service problems. The reliance on telecommunications services for emergency response highlighted the importance of resilient communication networks for public safety and government operations [Article 108599].

Sources

Back to List