Published Date: 2021-07-22
Postmortem Analysis | |
---|---|
Timeline | 1. The software failure incident happened on July 22, 2021 [116422, 116657, 116434]. |
System | 1. Akamai's Edge DNS service [116422, 116657, 116434] 2. Akamai's DNS system [116422, 116657, 116434] |
Responsible Organization | 1. Akamai Technologies [116422, 116657, 116434] |
Impacted Organization | 1. Delta Air Lines [116422, 116657, 116434] 2. Costco Wholesale Corp [116657, 116434] 3. American Express [116657, 116434] 4. Home Depot [116657, 116434] 5. UPS [116434] 6. Airbnb [116422, 116657, 116434] 7. Fidelity [116422] 8. US Securities and Exchange Commission [116422, 116434] 9. British Airways [116422] 10. FedEx [116422, 116434] 11. American Airlines [116434] 12. HBO Max [116434] 13. US Bank [116434] 14. Fox News [116434] 15. Southwest Airlines [116434] 16. Vanguard [116434] 17. Interactive Brokers [116434] 18. Santander Bank [116434] 19. BBVA [116434] 20. BB&T [116434] 21. Groupon [116434] 22. Expedia [116434] 23. TikTok [116434] 24. Microsoft [116434] 25. Evernote [116434] 26. Go Daddy [116434] (Note: This list includes entities impacted by the software failure incident reported in the news articles.) |
Software Causes | 1. A software configuration update triggered a bug in the DNS system, causing the outage [116422, 116657, 116434]. 2. The disruption was not due to a cyberattack but rather a bug in the domain name system (DNS) service during a software update [116657, 116434]. 3. Akamai confirmed that the disruption lasted up to an hour and was caused by a software issue [116657, 116434]. 4. Akamai mentioned that the disruption was not due to a cyberattack against their platform [116434]. |
Non-software Causes | 1. The failure incident was not the result of a cyberattack, as confirmed by Akamai Technologies and Oracle [116422, 116657, 116434]. 2. The outage was triggered by a bug in the domain name system (DNS) service, which was caused by a software configuration update [116422, 116657, 116434]. 3. The disruption was not due to a cyberattack but rather a glitch in Akamai Technologies Inc's systems [116657]. 4. The outage was sparked by a bug in the DNS service that was triggered during a software update [116657]. 5. The disruption was caused by a bug in the DNS system, not a cyberattack [116422]. 6. The outage was not due to a cyberattack but was a result of a software update triggering a bug in the DNS system [116434]. |
Impacts | 1. Major corporate websites such as FedEx, Delta Air Lines, HSBC, McDonald’s, Fidelity, US Securities and Exchange Commission’s document search site, Airbnb, British Airways, Costco, American Express, Home Depot, UPS, HBO Max, US Bank, Fox News, FedEx, American Airlines, DraftKings, McDonald's, Southwest Airlines, Vanguard, Airbnb, Expedia, Charles Schwab, Microsoft, Evernote, Go Daddy, and others experienced outages [116422, 116657, 116434]. 2. The disruption lasted up to an hour, causing inconvenience to users trying to access these websites [116422, 116657, 116434]. 3. The outage was triggered by a bug in the DNS system due to a software configuration update by Akamai Technologies [116422, 116657, 116434]. 4. The disruption was not the result of a cyberattack but rather a technical issue [116422, 116657, 116434]. 5. Akamai Technologies confirmed that the disruption was resolved by rolling back the software update [116657, 116434]. 6. The outage highlighted the risks of the internet's reliance on a relatively small number of core infrastructure providers [116422]. |
Preventions | 1. Implementing more rigorous testing procedures for software updates to catch potential bugs before deployment could have prevented the software failure incident [116422, 116657, 116434]. 2. Enhancing redundancy and failover mechanisms within the DNS system to quickly mitigate disruptions in case of software bugs could have helped prevent the incident [116422, 116657, 116434]. 3. Conducting thorough risk assessments and impact analyses before implementing software updates to understand potential vulnerabilities and their consequences could have aided in preventing the outage [116422, 116657, 116434]. |
Fixes | 1. Rolling back the software update that triggered the bug in the DNS system could fix the software failure incident [116657, 116434]. 2. Reviewing the software update process to prevent future disruptions could help prevent similar incidents in the future [116657, 116434]. | References | 1. Akamai Technologies Inc [116422, 116657, 116434] 2. Oracle Corp [116422, 116657, 116434] 3. Amazon Web Services (AWS) [116657, 116434] 4. Down Detector [116657, 116434] 5. US Securities and Exchange Commission [116434] 6. Fastly [116434] 7. Rockbridge County Fire-Rescue & Emergency Management [116434] 8. Campbell County Department of Public Safety [116434] 9. Greyson County Sheriff's Office [116434] 10. CenturyLink [116434] |
Category | Option | Rationale |
---|---|---|
Recurring | one_organization, multiple_organization | (a) The software failure incident has happened again at one_organization: - Akamai Technologies experienced a similar incident before, as this was the second major disruption linked to the cloud company in about a month [Article 116657]. - Akamai Technologies was also involved in a previous outage incident in early June [Article 116422]. (b) The software failure incident has happened again at multiple_organization: - Fastly, a competitor of Akamai, was blamed for a separate widespread internet outage in the past [Article 116434]. - The recent outages involving Akamai and Fastly are examples of software failures impacting multiple organizations [Article 116422]. |
Phase (Design/Operation) | design | (a) The software failure incident related to the design phase: - The incident was caused by a "software configuration update triggered a bug in the DNS system" [116422]. - Akamai Technologies mentioned that a software update had triggered a bug in the domain name system (DNS) service, leading to the disruption [116657]. - The disruption was a result of a software update that triggered a bug in Akamai's DNS service [116434]. (b) The software failure incident related to the operation phase: - The disruption was not due to a cyberattack but was caused by a bug in the DNS system triggered during a software update [116422]. - The disruption was not due to a cyberattack and was caused by a bug in the domain name system (DNS) service, which was triggered during a software update [116657]. - The disruption was not the result of a cyberattack but was due to a bug in Akamai's DNS service triggered by a software update [116434]. |
Boundary (Internal/External) | within_system | (a) within_system: - The software failure incident was caused by a bug in the DNS system triggered by a software configuration update by Akamai [116422, 116657, 116434]. - Akamai confirmed that the disruption was not due to a cyberattack but rather an internal issue with the software update process [116657, 116434]. - The disruption lasted up to an hour and was resolved by rolling back the software update [116657, 116434]. (b) outside_system: - The software failure incident was not the result of a cyberattack but rather an internal issue with the software update process [116422, 116657, 116434]. - The disruption affected major websites and services, impacting users accessing these sites [116422, 116657, 116434]. - The outage was a result of a bug in the DNS system caused by a software configuration update, indicating an internal system issue [116422, 116657, 116434]. |
Nature (Human/Non-human) | non-human_actions | (a) The software failure incident occurring due to non-human actions: The software failure incident involving the major internet outage was caused by a bug in the DNS system triggered by a software configuration update, as reported by Akamai. This incident was not the result of a cyberattack but rather a non-human factor introduced during the update process [116422, 116657, 116434]. (b) The software failure incident occurring due to human actions: There is no specific mention in the articles about the software failure incident being directly caused by human actions. The incident was attributed to a bug in the DNS system triggered by a software configuration update, indicating a non-human factor as the primary cause [116422, 116657, 116434]. |
Dimension (Hardware/Software) | software | (a) The software failure incident occurring due to hardware: - The outage reported in the articles was not due to hardware issues but rather a software failure incident caused by a bug in the DNS system triggered by a software configuration update [116422, 116657, 116434]. (b) The software failure incident occurring due to software: - The software failure incident was specifically attributed to a bug in the DNS system caused by a software configuration update, as confirmed by Akamai [116422, 116657, 116434]. |
Objective (Malicious/Non-malicious) | non-malicious | (a) The software failure incident was non-malicious. The incident was caused by a bug in the DNS system triggered by a software configuration update at Akamai Technologies [116422, 116657, 116434]. The disruption was not the result of a cyberattack, and Akamai confirmed that the issue was unintentional and not caused by any malicious activity. The outage affected major websites and services, leading to DNS service errors and temporary unavailability of various platforms. Akamai mentioned that they are reviewing their software update process to prevent future disruptions, indicating that the incident was not intentional. |
Intent (Poor/Accidental Decisions) | accidental_decisions | (a) The software failure incident related to the Akamai outage was primarily due to accidental_decisions. The incident was caused by a "software configuration update triggered a bug in the DNS system" [116422]. Akamai confirmed that the disruption was not the result of a cyberattack and that the outage lasted up to an hour. The disruption was resolved by rolling back the software update [116657]. Akamai also mentioned that they are reviewing their software update process to prevent future disruptions [116657]. Additionally, the outage was not the result of a cyberattack, as confirmed by Akamai [116434]. The disruption was caused by a bug in the domain name system (DNS) service, which was triggered during a software update [116657]. Akamai apologized for the inconvenience caused by the disruption and stated that they are taking steps to prevent similar incidents in the future [116434]. |
Capability (Incompetence/Accidental) | development_incompetence | (a) The software failure incident occurring due to development incompetence: - The outage was caused by a "software configuration update triggered a bug in the DNS system" [116422]. - Akamai mentioned that they are reviewing their software update process to prevent future disruptions [116657]. - Fastly also experienced a similar issue in the past due to an "undiscovered software bug" triggered by a customer updating their settings [116434]. (b) The software failure incident occurring accidentally: - Akamai confirmed that the disruption was not due to a cyberattack [116422]. - Akamai stated that the disruption was caused by a bug in the domain name system (DNS) service triggered during a software update [116657]. - Akamai confirmed that the disruption was not due to a cyberattack and that it was caused by a software update triggering a bug in its DNS service [116434]. |
Duration | temporary | (a) The articles report that the software failure incident related to the major internet outage caused by a glitch in Akamai's systems was temporary. The outage lasted up to an hour and was resolved after rolling back a software update that triggered a bug in the DNS service [116422, 116657, 116434]. (b) The software failure incident was temporary and not permanent. |
Behaviour | omission, other | (a) crash: - The software failure incident in the articles can be categorized as a crash as it led to a sweeping internet disruption, causing major corporate websites like FedEx, Delta Air Lines, HSBC, and McDonald's to go down [116422]. - The outage was caused by a "software configuration update triggered a bug in the DNS system," leading to the disruption [116422]. - The disruption lasted for up to an hour before being resolved [116657]. - The disruption was not the result of a cyberattack but was due to a bug in the domain name system (DNS) service triggered during a software update [116657]. - The disruption caused websites to display domain name system (DNS) service errors [116657]. - Akamai confirmed that the disruption was not due to a cyberattack and that they were reviewing their software update process to prevent future disruptions [116657]. - The disruption impacted access to many internet resources, including websites like Delta Air Lines, Costco, American Express, and Home Depot [116657]. - The disruption was resolved by rolling back the software update [116434]. - Akamai confirmed that the disruption was not a cyberattack against their platform and apologized for the inconvenience caused [116434]. (f) other: - The software failure incident can also be considered as an omission as the system omitted to perform its intended functions, leading to the outage of major websites [116422]. - The outage was a result of a bug in the DNS system that caused the system to omit its function of correctly routing web browsers to their destinations [116422]. - The disruption caused websites to be down and display DNS service errors, indicating an omission in performing the correct routing function [116657]. - The disruption impacted the ability of users to connect to websites, suggesting an omission in the system's function of translating domain names to IP addresses [116657]. - The disruption led to major websites being temporarily unavailable, indicating an omission in the system's function of maintaining website availability [116434]. |
Layer | Option | Rationale |
---|---|---|
Perception | None | None |
Communication | None | None |
Application | None | None |
Category | Option | Rationale |
---|---|---|
Consequence | property, non-human, other | (a) death: People lost their lives due to the software failure - There is no mention of any deaths resulting from the software failure incident reported in the articles [116422, 116657, 116434]. (b) harm: People were physically harmed due to the software failure - There is no mention of any physical harm to individuals due to the software failure incident reported in the articles [116422, 116657, 116434]. (c) basic: People's access to food or shelter was impacted because of the software failure - The software failure incident did not directly impact people's access to food or shelter [116422, 116657, 116434]. (d) property: People's material goods, money, or data was impacted due to the software failure - The software failure incident did impact major corporate websites, causing disruptions in services and access for users [116422, 116657, 116434]. (e) delay: People had to postpone an activity due to the software failure - Users of the affected websites experienced delays in accessing services during the outage, but there is no specific mention of activities being postponed [116422, 116657, 116434]. (f) non-human: Non-human entities were impacted due to the software failure - The software failure incident affected major corporate websites, disrupting their services and operations [116422, 116657, 116434]. (g) no_consequence: There were no real observed consequences of the software failure - The software failure incident did have consequences in terms of website outages and disruptions, so it does not fall under the category of no consequences [116422, 116657, 116434]. (h) theoretical_consequence: There were potential consequences discussed of the software failure that did not occur - The articles do not mention any potential consequences discussed that did not actually occur as a result of the software failure incident [116422, 116657, 116434]. (i) other: Was there consequence(s) of the software failure not described in the (a to h) options? What is the other consequence(s)? - The software failure incident led to major website outages affecting various sectors such as retail, financial, logistics, and travel, causing disruptions in online services for users [116422, 116657, 116434]. |
Domain | information, transportation, sales, finance, government | (a) The failed system was intended to support the information industry, specifically affecting major corporate websites, financial institutions, government websites, and various online services that provide information and content to users [116422, 116657, 116434]. (b) The transportation industry was impacted by the software failure incident, as major airlines like Delta Air Lines experienced website outages, affecting their ability to provide transportation services to customers [116422, 116657, 116434]. (d) The sales industry was affected by the software failure incident, with major retail companies like Costco, Home Depot, and online platforms like Airbnb experiencing website outages, impacting their sales and customer interactions [116657, 116434]. (h) The finance industry was impacted by the software failure incident, as financial institutions such as American Express, US Bank, and major investment firms like Fidelity experienced website disruptions, affecting their ability to provide financial services to customers [116657, 116434]. (l) The government sector was affected by the software failure incident, with government websites like the US Securities and Exchange Commission's document search site experiencing outages, impacting public access to government services and information [116422, 116434]. |
Article ID: 116422
Article ID: 116657
Article ID: 116434