Incident: NHS Covid-19 App Incorrectly Advises Self-Isolation Due to Code Error

Published Date: 2020-11-02

Postmortem Analysis
Timeline 1. The software failure incident with the NHS Covid-19 app happened in September 2020 as mentioned in Article 107687. 2. The article was published on 2020-11-02. 3. Therefore, the software failure incident occurred in September 2020.
System 1. NHS Covid-19 app's distance calculation algorithm 2. Incorporation of a measure of "infectiousness" into the app's code without corresponding adjustments in the risk threshold [107687]
Responsible Organization 1. The software engineers who rewrote the app's code and discovered the faulty maths since its launch in September [107687] 2. The decision-makers who incorporated a measure of "infectiousness" into the app's code without properly implementing the corresponding risk threshold adjustment [107687]
Impacted Organization 1. Users of the NHS Covid-19 app were impacted by the software failure incident [107687].
Software Causes 1. The software failure incident was caused by a code error in the NHS Covid-19 app, specifically an oversight in the app's algorithm that determined who needed to self-isolate [107687]. 2. The faulty maths in the app's code since its launch in September led to a significant number of users not being properly warned to quarantine when they were at risk of spreading coronavirus [107687]. 3. The decision to incorporate a measure of "infectiousness" into the app's code during testing in the Isle of Wight, and the subsequent update before the national launch to account for peak infectiousness periods, led to a miscalculation in the risk threshold for triggering alerts, resulting in incorrect isolation recommendations [107687]. 4. The failure was exacerbated by the app's generation of "ghost notifications" that warned users of possible exposure to Covid-19 but did not result in advice to self-isolate, due to the artificially low risk threshold in the app's code [107687].
Non-software Causes 1. The decision to incorporate a measure of "infectiousness" into the app's code without proper validation or testing [107687]. 2. The update made shortly before the national launch of the app to account for the fact that people are most infectious shortly after their symptoms show, without ensuring the corresponding changes in the risk threshold [107687]. 3. The oversight in not implementing the intended reduction in the risk threshold to ensure accurate alerts for users [107687]. 4. The lack of detection of the error until a new version of the contact-tracing app was created, indicating a gap in the testing and monitoring processes [107687].
Impacts 1. Thousands of people were incorrectly told they did not need to quarantine when they were actually at risk of spreading coronavirus, leading to potential further transmission of the virus [Article 107687]. 2. A "shockingly low" number of warnings were issued due to the faulty maths in the app, indicating that many individuals who should have been alerted to self-isolate were not notified [Article 107687]. 3. Users received "ghost notifications" warning them of possible exposure to Covid-19, but these notifications did not result in advice to isolate, causing confusion and potentially undermining the credibility of the app [Article 107687]. 4. The error in the app's code led to a lack of accurate risk assessment, potentially impacting the effectiveness of contact tracing efforts and the overall control of the spread of the virus [Article 107687].
Preventions 1. Thorough testing and validation of the software before its national launch could have potentially prevented the software failure incident [107687]. 2. Implementing a more robust and comprehensive quality assurance process to catch any faulty maths or logic in the code could have helped prevent the oversight in the app [107687]. 3. Conducting regular code reviews and audits to ensure that any changes made to the software, such as the update to account for peak infectiousness, are correctly implemented and functioning as intended [107687]. 4. Ensuring clear communication and coordination between software engineers, developers, and public health experts to accurately translate epidemiological guidelines into the app's algorithms and logic could have helped prevent the error related to infectiousness measurement in the code [107687].
Fixes 1. Implement thorough testing procedures before launching the app nationally to catch any potential code errors or faults [107687]. 2. Ensure that any changes made to the app's code, especially regarding risk thresholds and infectiousness metrics, are properly implemented and tested to avoid unintended consequences [107687]. 3. Regularly review and update the app's code to address any identified flaws or mistakes, especially in critical functionalities like risk assessment and notification triggering [107687].
References 1. Software engineers who discovered the faulty maths in the app [Article 107687] 2. Source quoted by the Sunday Times regarding the flaw in the app [Article 107687] 3. Department for Health and Social Care (DHSC) spokesperson [Article 107687] 4. The Guardian, which reported on the root of the error in the app [Article 107687] 5. Insider who mentioned the unfeasibly high risk score and ghost notifications issue in the app [Article 107687] 6. Randeep Sidhu, head of product for the test and trace app, and Gaby Appleton, the director [Article 107687]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident related to the NHS Covid-19 app has happened again within the same organization. The article mentions that the oversight in the app, which led to thousands of people being incorrectly told they did not need to quarantine, was due to a code error that had been present since the app's launch in September [107687]. This indicates that the software failure incident occurred within the NHS organization itself. (b) There is no information in the articles about the software failure incident happening again at other organizations or with their products and services.
Phase (Design/Operation) design, operation (a) The software failure incident in the NHS Covid-19 app was primarily due to design-related factors introduced during the system development phase. The error stemmed from a decision to incorporate a measure of "infectiousness" into the app's code, which led to faulty math calculations determining who needed to isolate. This design flaw was present since the app's launch in September and was only discovered when software engineers rewrote how the app decides who needs to isolate [107687]. (b) Additionally, the software failure incident also had operational implications related to the misuse of the system. Users were receiving "ghost notifications" warning them of possible exposure to Covid-19, but these notifications did not result in advice to isolate. Initially, users were told they could ignore these notifications, assuming they were below the risk threshold. However, with the discovery of the faulty math calculations, it became apparent that many of these ghost notifications should have actually been advice to self-isolate. The new version of the app addressed this issue by removing these notifications altogether, ensuring that users only receive notifications if they are supposed to self-isolate [107687].
Boundary (Internal/External) within_system (a) The software failure incident in the NHS Covid-19 app was primarily within the system. The failure was attributed to a code error that led to faulty math calculations within the app itself. The oversight in the app's code resulted in users not being properly notified to self-isolate even when they were at risk of spreading the coronavirus. The error was discovered by software engineers during a rewrite of the app's decision-making process, revealing that the app had been relying on incorrect calculations since its launch in September [Article 107687].
Nature (Human/Non-human) non-human_actions (a) The software failure incident in the NHS Covid-19 app was primarily due to non-human actions, specifically a code error that led to faulty math calculations determining the risk of exposure to the virus. The flaw in the app's code resulted in incorrect risk assessments, leading to users not being properly notified to self-isolate when they were at risk of spreading coronavirus. The error was only discovered when software engineers rewrote the app's code and found the faulty math that had been in place since the app's launch in September [107687].
Dimension (Hardware/Software) software (a) The software failure incident reported in Article 107687 was primarily due to contributing factors originating in software. The incident was caused by a code error in the NHS Covid-19 app, where faulty maths in the app's code led to incorrect risk assessments and notifications to users. The error was related to how the app calculated the risk of exposure to Covid-19 and when to trigger alerts for self-isolation. The faulty math in the software led to users not being properly notified to self-isolate even when they were at risk of spreading the virus [107687]. (b) There is no specific mention in the articles about the software failure incident being caused by contributing factors originating in hardware. Therefore, the incident was primarily attributed to software-related issues.
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident described in Article 107687 was non-malicious. The failure was attributed to a code error in the NHS Covid-19 app, specifically a faulty math calculation that led to incorrect risk assessments and notifications to users. The error was a result of unintentional oversight and incorrect implementation of the risk threshold logic within the app's code [107687].
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident in the NHS Covid-19 app was primarily due to poor decisions. The root of the error was traced back to a decision to incorporate a measure of "infectiousness" into the app's code. The decision to update the app shortly before its national launch to account for the fact that people are most infectious shortly after their symptoms show led to a change in the math that was supposed to reduce the risk threshold correspondingly. However, this change was not implemented as intended, resulting in users being given incorrect instructions about self-isolation requirements [Article 107687].
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident in the NHS Covid-19 app was primarily due to development incompetence. The article [107687] mentions that the oversight in the app was caused by a code error that resulted from faulty maths in the app's code since its launch in September. The error was discovered when software engineers rewrote how the app decides who needs to isolate, revealing that the app had been relying on incorrect calculations. Additionally, the decision to incorporate a measure of "infectiousness" into the app's code led to the error, as the intended changes to reduce the risk threshold were not implemented properly, leading to incorrect notifications and advice given to users. (b) The software failure incident can also be attributed to accidental factors. The article [107687] highlights that the error in the app was not intentional but rather a result of unintended consequences of the changes made to the code. The update to the app shortly before its national launch aimed to adjust the risk threshold based on infectiousness levels, but the failure to implement this change correctly led to the erroneous notifications and advice given to users. Additionally, the presence of "ghost notifications" warning users of potential exposure, which were later found to be inaccurate due to the artificially low risk threshold, further indicates accidental factors contributing to the failure incident.
Duration temporary (a) The software failure incident described in the article appears to be temporary. The failure was due to a code error in the NHS Covid-19 app that resulted in users not being instructed to self-isolate even when they were at risk of spreading the coronavirus. This error was discovered by software engineers when they rewrote how the app decides who needs to isolate, revealing that faulty maths had been used since the app's launch in September [Article 107687]. The error was rectified with the creation of a new version of the contact-tracing app that could better account for exposures at mid-range, indicating that the failure was not permanent but rather a result of specific circumstances and factors introduced during the app's development and launch.
Behaviour crash, omission, value, other (a) crash: The software failure incident in the NHS Covid-19 app can be associated with a crash behavior as it was not performing its intended functions correctly due to a code error, leading to users not being instructed to self-isolate when they were at risk of spreading coronavirus [107687]. (b) omission: The incident can also be linked to an omission behavior as the app failed to correctly identify and notify users who needed to self-isolate, resulting in thousands of people being wrongly informed that they did not need to quarantine when they were actually at risk [107687]. (c) timing: The timing behavior is not directly mentioned in the articles provided. (d) value: The software failure incident can be attributed to a value behavior as the app was providing incorrect risk assessments to users, leading to a significant number of warnings not being issued to individuals who should have been advised to self-isolate [107687]. (e) byzantine: The byzantine behavior is not explicitly described in the articles provided. (f) other: The software failure incident can be categorized under the "other" behavior as it involved a combination of factors such as faulty math calculations, incorrect risk thresholds, unfeasibly high risk scores, and ghost notifications, resulting in a flawed decision-making process within the app [107687].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence theoretical_consequence, unknown (a) unknown (b) unknown (c) unknown (d) unknown (e) unknown (f) unknown (g) The software failure incident resulted in thousands of people being told they did not need to quarantine when they were actually at risk of spreading coronavirus [107687]. (h) The software failure incident led to a "shockingly low" number of warnings being issued, indicating a potential consequence of reduced effectiveness in notifying individuals at risk [107687]. (i) unknown
Domain health (a) The failed system was related to the health industry as it was the NHS Covid-19 app that experienced a software failure incident. The app was designed for use in England and Wales to help with contact tracing and notifying individuals who may have been exposed to the virus [Article 107687].

Sources

Back to List