Incident: Widespread Outage Affects Major Websites, Linked to Akamai DNS Issue

Published Date: 2021-07-22

Postmortem Analysis
Timeline 1. The software failure incident happened on Thursday afternoon, as mentioned in Article 116529. 2. Published on 2021-07-22 07:00:00+00:00. 3. The incident occurred on Thursday afternoon, which was on July 22, 2021.
System 1. Akamai's Edge DNS service [116529]
Responsible Organization 1. Akamai - The software failure incident was caused by a glitch related to a performance product offered by Akamai, as reported in Article 116529.
Impacted Organization 1. HSBC, ITV, Waitrose [116529] 2. Barclays, TSB, Bank of Scotland, Tesco Bank, Sainsbury’s Bank [116529] 3. Airbnb, PlayStation Network, Steam [116529]
Software Causes 1. The software causes of the failure incident were related to a performance product offered by Akamai, specifically their Edge DNS service, which experienced issues leading to the widespread outage affecting websites like HSBC, ITV, Waitrose, Barclays, TSB, Bank of Scotland, Tesco Bank, Sainsbury’s Bank, Airbnb, PlayStation Network, and Steam [116529].
Non-software Causes 1. The widespread outage was caused by a problem linked to a performance product offered by Akamai, an American firm [116529]. 2. The incident was not a result of a cyber-attack on the Akamai platform [116529].
Impacts 1. The websites of HSBC, ITV, Waitrose, Barclays, TSB, Bank of Scotland, Tesco Bank, Sainsbury’s Bank, Airbnb, PlayStation Network, and Steam were either entirely or partially inaccessible during the outage [116529]. 2. Users experienced DNS errors when trying to access certain services [116529]. 3. The outage caused disruption for users trying to access online banking services and other platforms [116529].
Preventions 1. Implementing thorough testing procedures to catch software bugs before deployment [116529]. 2. Conducting regular audits and reviews of the software system to identify and address potential vulnerabilities [116529]. 3. Ensuring proper communication and coordination between different service providers to prevent cascading failures [116529].
Fixes 1. Implementing a fix for the issue by Akamai, as mentioned in the article [116529]. 2. Monitoring the situation to ensure normal operations resume, as stated by Akamai in the article [116529].
References 1. Akamai (mentioned in Article 116529) 2. Fastly (mentioned in Article 116529)

Software Taxonomy of Faults

Category Option Rationale
Recurring multiple_organization (a) The software failure incident having happened again at one_organization: The incident mentioned in the article is not specifically linked to a previous incident within the same organization. The outage experienced by various companies like HSBC, ITV, Waitrose, Barclays, TSB, Bank of Scotland, Tesco Bank, Sainsbury’s Bank, Airbnb, PlayStation Network, and Steam was attributed to a glitch related to Akamai's Edge DNS service. This incident was not directly connected to a previous failure within the same organization [116529]. (b) The software failure incident having happened again at multiple_organization: The article mentions a similar outage that occurred six weeks prior, affecting prominent websites including gov.uk. In that previous incident, the US-based company Fastly experienced issues due to an "undiscovered software bug" triggered by a single unnamed customer who updated their settings. This indicates that software failure incidents have occurred at multiple organizations, with Fastly being one of them [116529].
Phase (Design/Operation) design (a) The software failure incident mentioned in the articles seems to be related to the design phase. The incident was attributed to a performance product offered by Akamai, an American firm, which began investigating issues just after 5 pm UK time. Akamai's Edge DNS service, designed to improve loading times and combat attacks, experienced a glitch affecting various websites and services [116529]. (b) The articles do not provide specific information indicating that the software failure incident was due to factors introduced by the operation or misuse of the system.
Boundary (Internal/External) within_system (a) within_system: The software failure incident mentioned in the article was related to a widespread outage affecting various websites and online services. The issue was linked to a performance product offered by Akamai, a company that provides Edge DNS services to improve loading times and combat attacks. Akamai began investigating the issues and implemented a fix to resume normal operations. The incident was not attributed to a cyber-attack on the Akamai platform, indicating that the failure originated within the system itself [116529]. (b) outside_system: The software failure incident was not attributed to an external cyber-attack on the Akamai platform. Instead, it was related to a performance product offered by Akamai, which experienced issues affecting various websites and online services. The incident was not caused by an attack from outside the system but rather by internal factors related to the Akamai service [116529].
Nature (Human/Non-human) non-human_actions (a) The software failure incident was attributed to a non-human action. The outage affecting various websites, including HSBC, ITV, Waitrose, Barclays, TSB, Bank of Scotland, Tesco Bank, Sainsbury’s Bank, Airbnb, PlayStation Network, and Steam, was caused by a widespread glitch. Akamai, a performance product provider, was investigating issues related to its Edge DNS service, which is designed to enhance loading times and combat attacks. The problem was not a result of a cyber-attack on the Akamai platform, indicating a non-human factor in the failure incident [116529]. (b) The articles do not mention any contributing factors introduced by human actions in the software failure incident.
Dimension (Hardware/Software) software (a) The software failure incident reported in the article is not attributed to hardware issues but rather to a glitch in the software systems. The outage affecting various websites and services, including online banking services, Airbnb, PlayStation Network, and Steam, was linked to a performance product offered by Akamai, a software company. Akamai's Edge DNS service, designed to improve loading times and combat attacks, experienced issues that led to the disruption [116529]. (b) The software failure incident was primarily caused by contributing factors originating in software, specifically related to a glitch in Akamai's software system. The incident was not a result of hardware failure but rather a software bug within the Edge DNS service provided by Akamai [116529].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident described in the articles does not appear to be malicious. It was attributed to a temporary glitch related to a performance product offered by Akamai, a company that provides services to improve loading times and combat attacks. Akamai stated that the issue was not a result of a cyber-attack on their platform [116529]. (b) The incident was linked to a non-malicious cause, specifically a software bug in the system of Fastly, another company that experienced a similar outage six weeks prior. Fastly mentioned that the issue was triggered by a single unnamed customer who updated their settings, indicating a non-malicious contributing factor [116529].
Intent (Poor/Accidental Decisions) accidental_decisions (a) The software failure incident mentioned in the article does not directly point to poor decisions as the cause. Instead, it highlights issues related to a glitch in the Akamai service and an undiscovered software bug in Fastly's system, which were not explicitly linked to poor decisions [116529].
Capability (Incompetence/Accidental) accidental (a) The software failure incident related to development incompetence is not explicitly mentioned in the provided article. (b) The software failure incident related to accidental factors is mentioned in the article. The incident involving the outage affecting various websites, including HSBC, ITV, Waitrose, Barclays, TSB, Bank of Scotland, Tesco Bank, Sainsbury’s Bank, Airbnb, PlayStation Network, and Steam, was attributed to a temporary glitch. The glitch was reportedly linked to a performance product offered by Akamai, an American firm. Akamai stated that the issue was not a result of a cyber-attack on their platform but rather a technical glitch that was resolved [116529].
Duration temporary The software failure incident described in Article 116529 was temporary. It caused disruption to various websites and services, including online banking services, Airbnb, PlayStation Network, and Steam. The outage was linked to a performance product offered by Akamai, specifically their Edge DNS service. Akamai investigated the issues and implemented a fix, leading to the service resuming normal operations. The incident was not a result of a cyber-attack on the Akamai platform. This temporary glitch lasted for a short period before being resolved [116529].
Behaviour crash (a) crash: The incident described in the article mentions a widespread outage that caused disruption on various websites, including major online banking services like Barclays, TSB, Bank of Scotland, Tesco Bank, and Sainsbury’s Bank, being either entirely or partially inaccessible for a short period. Additionally, services like Airbnb, PlayStation Network, and Steam also experienced the temporary glitch, with some showing users a DNS error. This behavior aligns with a crash where the system loses its state and fails to perform its intended functions [116529]. (b) omission: The article does not specifically mention any instances where the system omitted to perform its intended functions at an instance(s). (c) timing: The article does not mention any instances where the system performed its intended functions correctly but too late or too early. (d) value: The article does not mention any instances where the system performed its intended functions incorrectly. (e) byzantine: The article does not mention any instances where the system behaved erroneously with inconsistent responses and interactions. (f) other: The behavior described in the article aligns most closely with a crash, where the system lost its state and failed to perform its intended functions, affecting various websites and services [116529].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence theoretical_consequence (a) death: There is no mention of any deaths resulting from the software failure incident in the provided article [116529]. (b) harm: There is no mention of any physical harm to individuals resulting from the software failure incident in the provided article [116529]. (c) basic: There is no mention of people's access to food or shelter being impacted due to the software failure incident in the provided article [116529]. (d) property: The software failure incident impacted people's access to online banking services, online platforms like Airbnb, PlayStation Network, and Steam, but there is no specific mention of material goods, money, or data being directly impacted in terms of loss or theft in the article [116529]. (e) delay: The software failure incident caused disruption and inaccessibility to various online services, including online banking, but there is no specific mention of people having to postpone activities due to the incident in the article [116529]. (f) non-human: The software failure incident affected various online platforms and services like websites of HSBC, ITV, Waitrose, online banking services, Airbnb, PlayStation Network, and Steam, but there is no specific mention of non-human entities being impacted in the article [116529]. (g) no_consequence: The software failure incident caused disruption and inaccessibility to various online services, but there is no mention of any real observed consequences beyond the service disruption in the article [116529]. (h) theoretical_consequence: The article mentions that the incident was not a result of a cyber-attack on the Akamai platform, indicating a potential theoretical consequence that could have been more severe if it was a deliberate attack [116529]. (i) other: There are no other consequences described in the article beyond the service disruption and potential theoretical consequences [116529].
Domain finance (a) The incident affected major online banking services like Barclays, TSB, Bank of Scotland, Tesco Bank, and Sainsbury’s Bank [Article 116529]. (h) The incident also impacted financial services as mentioned in the article, with websites of HSBC and other banks being hit by the outage [Article 116529]. (m) The incident was related to the technology industry, specifically the performance product offered by Akamai, an American firm, which was being investigated for causing the disruption [Article 116529].

Sources

Back to List