Incident: NHS App Outage Causes Travel Chaos and Disruption.

Published Date: 2021-10-13

Postmortem Analysis
Timeline 1. The software failure incident happened on Wednesday, as mentioned in the article [119728]. 2. The article was published on 2021-10-13. 3. Therefore, the software failure incident occurred on Wednesday, October 13, 2021.
System 1. England's NHS app 2. nhs.uk website
Responsible Organization 1. A technical issue with a global service provider was responsible for causing the software failure incident with England's NHS app [Article 119728].
Impacted Organization 1. Travellers trying to board flights and ferries for trips abroad were impacted by the software failure incident [Article 119728].
Software Causes 1. The software failure incident was caused by a technical issue with a global service provider that affected many different organizations, leading to the outage of England's NHS app [Article 119728].
Non-software Causes 1. The failure incident was caused by a technical issue with a global service provider that affected many different organizations [119728].
Impacts 1. Travellers were blocked from boarding flights and ferries for trips abroad due to the four-hour outage of England's NHS app, leaving them unable to access their Covid pass to prove their vaccine status [119728]. 2. People faced chaos and fury as they were unable to download necessary information, including a QR code and jab details required by many countries before departure, leading to disruptions in travel plans [119728]. 3. Some travellers had to leave the airport, missing their flights, and incurring financial losses, such as Chuck Adolphy and his girlfriend who were unable to board their flight to Slovenia despite showing their vaccine cards [119728]. 4. Caroline Frost was denied boarding for her flight from Heathrow to Nice as she couldn't provide the necessary proof of vaccination due to the app failure, leading to inconvenience and potential financial costs [119728]. 5. Tara Birkett missed a ferry crossing from Dover to Calais due to the inability to retrieve her Covid pass, causing disruptions in travel schedules and potentially additional expenses [119728]. 6. The software failure also impacted people attending events in England, such as music venues, where the app is required to demonstrate inoculation, potentially leading to entry issues and disruptions [119728].
Preventions 1. Implementing redundancy and failover mechanisms to ensure continuous service availability in case of technical issues with global service providers [119728]. 2. Conducting regular stress testing and performance monitoring of the app and website to identify and address potential issues before they impact users [119728]. 3. Providing alternative methods for accessing critical information, such as a paper Covid pass, to mitigate the impact of digital failures on travellers [119728].
Fixes 1. Implementing redundancy and failover mechanisms to ensure continuous service availability in case of technical issues with global service providers [119728]. 2. Conducting thorough testing and quality assurance processes to identify and address potential issues before they impact users [119728]. 3. Enhancing communication and support channels for users affected by software failures to provide alternative solutions and assistance during outages [119728].
References 1. NHS Digital 2. Chuck Adolphy 3. Caroline Frost 4. Tara Birkett 5. Users of the Scottish vaccine passport app 6. Users of the Northern Ireland app 7. Welsh government

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident related to the NHS app in England has happened again within the same organization. The article mentions that the other UK nations' Covid apps have also experienced problems in recent months. For example, users of the Scottish vaccine passport app reported issues with the app not working and not finding their data in October. Additionally, in July in Northern Ireland, some app users could see other people's data. These incidents indicate that software failures have occurred within the same organization or with its products and services [119728]. (b) The software failure incident related to the NHS app in England has also happened at multiple organizations. The article mentions that the outage was blamed on "a technical issue with a global service provider that affected many different organizations." This indicates that the software failure incident not only impacted the NHS app but also affected other organizations relying on the same global service provider [119728].
Phase (Design/Operation) design (a) The software failure incident in the article was related to the design phase. The failure was attributed to "a technical issue with a global service provider that affected many different organisations" [Article 119728]. This indicates that the failure was due to contributing factors introduced during the system development or updates rather than the operation or misuse of the system.
Boundary (Internal/External) within_system (a) within_system: The software failure incident with the NHS app was attributed to "a technical issue with a global service provider that affected many different organisations" [119728]. This indicates that the failure originated within the system due to issues related to the service provider used by the app. (b) outside_system: The software failure incident was caused by a technical issue with a global service provider, suggesting that the contributing factor originated from within the system rather than outside the system [119728].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in the NHS app was attributed to a technical issue with a global service provider, which affected many different organizations [119728]. This indicates that the failure was due to non-human actions, specifically a technical issue introduced by a service provider. (b) Human actions also played a role in this incident as travelers were left unable to access their Covid passes due to the app malfunction, leading to chaos and frustration among passengers [119728]. Additionally, individuals like Chuck Adolphy and Caroline Frost were directly impacted by the failure as they were denied boarding for their flights due to the inability to retrieve their Covid passes [119728].
Dimension (Hardware/Software) software (a) The software failure incident in the article was not attributed to hardware issues. Instead, it was mentioned that the outage of England's NHS app was due to "a technical issue with a global service provider that affected many different organisations" [119728]. (b) The software failure incident in the article was specifically attributed to a technical issue with a global service provider affecting the NHS app and nhs.uk website, indicating that the failure originated in the software [119728].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident described in the article is non-malicious. The failure was attributed to "a technical issue with a global service provider that affected many different organisations" [119728]. There is no indication in the article that the outage was caused by malicious intent or actions.
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident described in Article 119728 was primarily due to poor decisions. The failure was attributed to "a technical issue with a global service provider that affected many different organisations" [119728]. This indicates that the reliance on a single global service provider without adequate contingency plans or redundancies in place led to the outage affecting multiple organizations, including the NHS app. Such a dependency on a single provider without proper risk assessment or mitigation strategies can be considered a poor decision that contributed to the software failure incident.
Capability (Incompetence/Accidental) accidental (a) The software failure incident reported in Article 119728 was not explicitly attributed to development incompetence. The outage of England's NHS app, which left travellers unable to access their Covid pass, was blamed on "a technical issue with a global service provider that affected many different organisations." This suggests that the failure was not directly linked to development incompetence but rather to an external technical issue. (b) The software failure incident in Article 119728 was attributed to an accidental technical issue with a global service provider, which affected the functionality of the NHS app and nhs.uk website. This accidental failure led to chaos and fury among travellers who were unable to access their Covid pass to prove their vaccine status before boarding flights and ferries.
Duration temporary (a) The software failure incident described in the article was temporary. The article mentions that the NHS app and nhs.uk website began malfunctioning just before noon on Wednesday and the problem was resolved by 4.30pm on the same day [119728]. This indicates that the failure was not permanent but rather temporary in nature.
Behaviour crash, omission, value, other (a) crash: The software failure incident in the article can be categorized as a crash. The NHS app and nhs.uk website malfunctioned, leaving people unable to access their Covid pass to prove their vaccine status, resulting in travellers being blocked from boarding flights and ferries [Article 119728]. (b) omission: The software failure incident can also be categorized as an omission. Many travellers were unable to download the required information for their trips abroad, including a QR code and details about their vaccinations, as the system told them to "try again later," leading to chaos and frustration [Article 119728]. (c) timing: The software failure incident can be linked to timing issues as well. Travellers were left in a lurch when they tried to get a digital version of their Covid pass less than four weeks before their departure, as they queued to check in, highlighting a timing issue in accessing the necessary information [Article 119728]. (d) value: The software failure incident can be associated with a value issue. The system was not performing its intended function correctly, as travellers were unable to retrieve their Covid passes to prove their vaccination status, leading to individuals being denied boarding for flights and ferries despite having their vaccine cards [Article 119728]. (e) byzantine: The software failure incident does not exhibit characteristics of a byzantine failure, as there is no mention of inconsistent responses or interactions in the article [Article 119728]. (f) other: The other behavior exhibited by the software failure incident is the disruption of travel plans and financial loss for individuals like Chuck Adolphy and Caroline Frost, who were unable to board their flights due to the system failure, resulting in them being ushered out of the airport and incurring additional expenses [Article 119728].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence delay The consequence of the software failure incident described in the article [119728] was primarily related to delays (e). People were impacted by the software failure as they were blocked from boarding flights and ferries due to the outage of England's NHS app, which left them unable to access their Covid pass to prove their vaccine status. This resulted in chaos, frustration, and financial losses for individuals like Chuck Adolphy, Caroline Frost, and Tara Birkett, who missed their flights or ferry crossings and had to deal with the repercussions of not being able to demonstrate their vaccination status in a digital format. Additionally, people attending events in England, such as music venues, were also affected by the inability to use the app to demonstrate their inoculation, further highlighting the delays caused by the software failure.
Domain information, health (a) The failed system was intended to support the production and distribution of information. The software failure incident involved the NHS app in England, which is crucial for providing individuals with a Covid pass to prove their vaccine status before traveling abroad. The outage of the app prevented hundreds of travelers from accessing their QR codes and vaccination details, causing chaos and disruptions in travel plans [Article 119728].

Sources

Back to List