Incident: Excel Limitation Leads to Covid-19 Data Loss in England

Published Date: 2020-10-05

Postmortem Analysis
Timeline 1. The software failure incident happened in England when nearly 16,000 coronavirus cases went unreported due to the use of Microsoft's Excel software [106120]. Estimation: Step 1: The article was published on 2020-10-05. Step 2: The incident was mentioned to have happened last week before the article was published. Step 3: Estimating from the published date, the incident likely occurred in late September or early October 2020.
System 1. Public Health England's data processing system 2. Microsoft Excel's XLS file format [106120]
Responsible Organization 1. Public Health England (PHE) - The software failure incident of nearly 16,000 unreported coronavirus cases in England was caused by the badly thought-out use of Microsoft Excel by Public Health England [106120].
Impacted Organization 1. Public Health England (PHE) [106120]
Software Causes 1. The software causes of the failure incident were: - Public Health England's use of an old Excel file format (XLS) instead of the newer XLSX format, limiting the number of rows of data each template could handle [106120].
Non-software Causes 1. The issue was caused by the way Public Health England (PHE) brought together logs produced by commercial firms in text-based lists without considering the limitations of the Excel template format [106120]. 2. PHE's developers picked an old file format (XLS) instead of a more modern format like XLSX, which could handle a significantly larger amount of data [106120]. 3. The Health Secretary Matt Hancock mentioned that the problem emerged due to PHE using a "legacy system" and a decision had been taken two months ago to replace it, indicating a lack of timely system upgrades [106120].
Impacts 1. Nearly 16,000 coronavirus cases went unreported in England due to the software failure incident [106120]. 2. Lives were put at risk because the contact-tracing process was delayed, potentially leading to the spread of the virus [106120].
Preventions 1. Using the correct file format: The software failure incident could have been prevented if Public Health England (PHE) had used the correct file format, such as XLSX instead of the outdated XLS format [106120]. 2. Proper testing and validation: Conducting thorough testing and validation of the system could have helped identify the limitations of the Excel templates before the incident occurred [106120]. 3. Implementing better data handling practices: PHE could have implemented better data handling practices and considered alternative solutions rather than relying solely on Excel for critical data processing tasks [106120].
Fixes 1. Upgrading the file format from XLS to XLSX to handle a larger number of cases [106120] 2. Developing a bespoke system tailored for handling large datasets instead of relying on Excel for critical data processing [106120]
References 1. Public Health England (PHE) [106120] 2. Health Secretary Matt Hancock [106120] 3. Prof Jon Crowcroft from the University of Cambridge [106120] 4. Labour's shadow health secretary Jonathan Ashworth [106120]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident related to using Microsoft Excel causing Covid-19 results to be lost happened at Public Health England (PHE) due to the agency's use of an old Excel file format (XLS) that limited the number of rows of data that could be handled [106120]. (b) The incident highlights a broader issue of using outdated software for critical data processing tasks, as mentioned by Prof Jon Crowcroft from the University of Cambridge, who emphasized that better alternatives exist and that Excel's XLS format is not suitable for serious data processing tasks [106120].
Phase (Design/Operation) design, operation (a) The software failure incident in Article 106120 occurred due to a design issue. Public Health England (PHE) used an old Excel file format (XLS) instead of the newer XLSX format, which limited the number of rows of data each template could handle. This design choice led to nearly 16,000 coronavirus cases going unreported in England because the templates could only handle about 1,400 cases each, causing further cases to be left off [106120]. (b) The software failure incident in Article 106120 was also impacted by operational factors. The issue arose during the operation of pulling data together into Excel templates by PHE's automatic process. The operational decision to use the XLS format instead of the more suitable XLSX format contributed to the failure, as it limited the capacity of each template and led to cases being left off due to the operational process of handling the data [106120].
Boundary (Internal/External) within_system (a) within_system: The software failure incident was primarily caused by factors originating from within the system. Public Health England (PHE) was responsible for the issue as they used an old Excel file format (XLS) that had limitations in handling data. The agency's developers chose this outdated format to bring together test result data, leading to the failure to report nearly 16,000 coronavirus cases in England [106120].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in Article 106120 occurred due to non-human_actions. The issue was caused by the way Public Health England (PHE) set up an automatic process to pull data into Excel templates using an old file format (XLS) that had a limitation on the number of rows it could handle. This limitation led to nearly 16,000 coronavirus cases going unreported in England [106120]. (b) The software failure incident in Article 106120 was also influenced by human_actions. The decision made by PHE's own developers to use the outdated XLS file format for handling the data in Excel templates was a human action that contributed to the failure. Additionally, the Health Secretary Matt Hancock acknowledged that a decision had been taken two months ago to replace the legacy system that led to the flaw, indicating human involvement in the decision-making process [106120].
Dimension (Hardware/Software) software (a) The software failure incident in Article 106120 occurred due to contributing factors that originate in software. The issue was caused by Public Health England's use of an old Excel file format (XLS) instead of the newer XLSX format, which limited the number of rows of data that could be handled, leading to nearly 16,000 coronavirus cases going unreported in England [106120].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident described in the article was non-malicious. The failure was attributed to the poorly thought-out use of Microsoft Excel by Public Health England (PHE), specifically the choice of an old file format (XLS) instead of a more modern format like XLSX. This decision led to a limitation in the number of rows of data that could be handled, resulting in nearly 16,000 coronavirus cases going unreported in England [106120]. The incident was a result of human error and poor decision-making rather than any malicious intent to harm the system.
Intent (Poor/Accidental Decisions) poor_decisions (a) The intent of the software failure incident was poor_decisions. The incident occurred due to the poor decision made by Public Health England (PHE) developers to use an old Excel file format (XLS) instead of the more modern XLSX format. This decision limited the number of rows of data that could be handled by each template, leading to nearly 16,000 coronavirus cases going unreported in England [106120].
Capability (Incompetence/Accidental) development_incompetence (a) The software failure incident in Article 106120 occurred due to development incompetence. Public Health England's (PHE) developers picked an old file format (XLS) to handle data from commercial firms analyzing swab tests, limiting each Excel template to about 1,400 cases instead of utilizing the newer XLSX format capable of handling significantly more data. This decision led to nearly 16,000 coronavirus cases going unreported in England [106120]. (b) The software failure incident in Article 106120 was accidental. The issue arose from the way PHE brought together logs in CSV files into Excel templates using the outdated XLS format, which could only handle a limited number of rows. This accidental choice of file format led to cases being left off and delayed contact tracing efforts, putting lives at risk [106120].
Duration temporary (a) The software failure incident in the article was temporary. The incident occurred due to the specific circumstance of Public Health England (PHE) using an old Excel file format (XLS) that had a limitation on the number of rows it could handle, leading to nearly 16,000 coronavirus cases going unreported in England. The Health Secretary Matt Hancock mentioned that a decision had been taken two months ago to replace the legacy system that caused the issue, indicating that it was not a permanent failure [106120]. (b) The software failure incident was not permanent as it was caused by the specific choice of using an outdated Excel file format by PHE. The incident was not a fundamental flaw in the software itself but rather a result of the agency's decision-making process and system setup. Additionally, PHE took steps to address the issue by breaking down the test result data into smaller batches to create a larger number of Excel templates, indicating that the failure was not inherent to the software but rather a circumstantial issue that could be resolved [106120].
Behaviour crash, omission, other (a) crash: The software failure incident in the article can be categorized as a crash. The issue with using an old Excel file format (XLS) caused the system to crash when the data reached the limit of about 65,000 rows, leading to further cases being left off [106120]. (b) omission: The software failure incident can also be categorized as an omission. Due to the limitation of the Excel templates handling only a certain number of rows, the system omitted to include further cases beyond the capacity, resulting in nearly 16,000 coronavirus cases going unreported in England [106120]. (c) timing: The software failure incident does not align with a timing failure. The system was not performing its intended functions too late or too early; rather, it was failing to include all the necessary data due to the limitations of the Excel templates [106120]. (d) value: The software failure incident does not align with a value failure. The system was not performing its intended functions incorrectly in terms of the value of the data; instead, it was failing to include all the data due to the limitations of the Excel templates [106120]. (e) byzantine: The software failure incident does not align with a byzantine failure. The system was not behaving erroneously with inconsistent responses and interactions; rather, it was failing to include all the necessary data due to the limitations of the Excel templates [106120]. (f) other: The software failure incident can be categorized as a failure due to a design flaw in the system. The choice of using an outdated Excel file format (XLS) instead of the more modern XLSX format led to the system crashing and omitting cases, highlighting a fundamental flaw in the system's design and data handling process [106120].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence harm The consequence of the software failure incident described in the articles is as follows: - Harm: People were physically harmed due to the software failure. Labour's shadow health secretary Jonathan Ashworth mentioned that lives had been put at risk because the contact-tracing process had been delayed, potentially leading to the spread of the virus [106120].
Domain health (a) The failed system was intended to support the health industry. The software failure incident involved the handling of coronavirus test results in England by Public Health England (PHE) using Microsoft Excel [106120].

Sources

Back to List