Recurring |
one_organization, multiple_organization |
(a) The software failure incident related to the million-row limit on Microsoft Excel causing data errors has happened before at other organizations or with their products and services. In 2013, an Excel error at JPMorgan led to the loss of almost $6bn due to a cell mistakenly dividing by the sum of two interest rates instead of the average [106096].
(b) The software failure incident related to the Microsoft Excel data files exceeding the maximum size causing a technical glitch has happened again within the same organization. The incident occurred when some Microsoft Excel data files sent from NHS Test and Trace to Public Health England exceeded the maximum size, leading to nearly 16,000 Covid-19 cases going unreported in England [106461]. |
Phase (Design/Operation) |
design, operation |
(a) The software failure incident in the articles was primarily due to design factors introduced during system development. The incident occurred because Public Health England (PHE) used an old file format (XLS) to bring together logs produced by commercial firms for swab tests into Excel templates. This design choice limited each template to about 65,000 rows of data instead of the one million-plus rows Excel is capable of handling. As a result, when the total number of cases reached the limit, further cases were simply left off, leading to nearly 16,000 Covid-19 cases being unreported [106461].
(b) The software failure incident was also influenced by operational factors related to the operation of the system. The technical glitch that caused the error in reporting nearly 16,000 Covid-19 cases was a result of some Microsoft Excel data files exceeding the maximum size after being sent from NHS Test and Trace to Public Health England. This operational issue led to the exclusion of 15,841 cases from the UK daily case figures between 25 September and 2 October. The error was discovered overnight on a Friday, and although it was fixed, by Monday afternoon, only 51% of those affected had been reached by contact tracers [106461]. |
Boundary (Internal/External) |
within_system |
(a) The software failure incident related to the Excel data error leading to nearly 16,000 Covid test results being misplaced was primarily within the system. The incident was caused by the way Public Health England (PHE) developers picked an old file format (XLS) to handle data from commercial firms carrying out swab tests, limiting each template to about 65,000 rows of data instead of the one million-plus rows Excel is capable of handling [106461]. This internal decision within PHE's system led to the data error and subsequent failure in reporting the Covid-19 cases accurately. |
Nature (Human/Non-human) |
non-human_actions, human_actions |
(a) The software failure incident in the articles was primarily due to non-human actions. Specifically, the incident was caused by a technical glitch related to the way Microsoft Excel data files were handled, leading to nearly 16,000 Covid-19 cases going unreported in England [106096, 106461]. The error occurred because the files exceeded the maximum size allowed in Excel, resulting in cases being left out of the daily case figures. This limitation was due to the file format chosen by Public Health England (PHE) developers, which could only handle about 65,000 rows of data instead of the one million-plus rows that Excel is capable of processing.
(b) While the software failure incident was primarily due to non-human actions as described above, human actions also played a role in the incident. The article mentions that the PHE developers picked an old file format (XLS) to handle the data, which contributed to the limitation in the number of rows that could be processed [106461]. Additionally, the incident highlighted the importance of proper data handling and processing procedures, which involve human decisions and actions in setting up the automated processes for data aggregation and reporting. |
Dimension (Hardware/Software) |
hardware, software |
(a) The software failure incident occurring due to hardware:
- The incident was caused by some Microsoft Excel data files exceeding the maximum size after they were sent from NHS Test and Trace to Public Health England [106461].
- The problem was caused by the way the agency brought together logs produced by commercial firms into Excel templates, which had a limitation on the number of rows of data they could handle [106461].
(b) The software failure incident occurring due to software:
- The error was caused by a technical glitch in Microsoft Excel software, where data files exceeded the maximum size, leading to cases being left out of the UK daily case figures [106461].
- The issue was caused by the PHE developers picking an old file format (XLS) to bring together data into Excel templates, which had limitations on the number of rows of data they could handle [106461].
- Microsoft Excel's million-row limit was a contributing factor to the misplacement of nearly 16,000 Covid test results by Public Health England [106096]. |
Objective (Malicious/Non-malicious) |
non-malicious |
(a) The software failure incident related to the Excel data error in handling Covid-19 test results was non-malicious. The incident was caused by technical glitches and errors in the way Microsoft Excel was used to process and store the data, leading to the misreporting of nearly 16,000 Covid-19 cases [106096, 106461]. The error was attributed to the limitations of Excel in handling large datasets, specifically the million-row limit in Excel spreadsheets, which resulted in data being cut off and not displayed correctly. Additionally, the use of an old file format (XLS) by Public Health England (PHE) developers further exacerbated the issue by limiting the number of rows that could be handled by each template, causing cases to be left out [106461].
(b) The incident was not a result of malicious intent but rather a consequence of technical limitations and errors in the handling of data files using Microsoft Excel. There is no indication in the articles that the failure was caused by any deliberate actions to harm the system or manipulate the data for malicious purposes. |
Intent (Poor/Accidental Decisions) |
poor_decisions |
(a) The software failure incident related to the Excel data error leading to nearly 16,000 Covid test results being misplaced by Public Health England can be attributed to poor decisions. The incident occurred due to the use of an old file format (XLS) by PHE developers to handle data in Excel templates, limiting each template to about 65,000 rows of data instead of the one million-plus rows that Excel is capable of handling [106461]. This poor decision in selecting the file format ultimately led to the truncation of data and the omission of thousands of test results from the official daily figures, impacting contact tracing efforts and potentially putting lives at risk. |
Capability (Incompetence/Accidental) |
development_incompetence, accidental |
(a) The software failure incident related to development incompetence is evident in the articles. The incident where nearly 16,000 Covid test results were misplaced by Public Health England was caused by a million-row limit on Microsoft Excel, leading to positive tests being left off the official daily figures [106096, 106461]. This issue arose due to the manual handling of data and the use of an old Excel file format that limited the number of rows that could be handled, causing crucial data to be omitted. The incident highlights the consequences of not adapting systems to handle the increasing volume of data efficiently, showcasing a lack of professional competence in managing data effectively.
(b) The software failure incident can also be attributed to accidental factors. The technical glitch that caused the error in reporting nearly 16,000 Covid-19 cases in England was described as a "technical error" by Health Secretary Matt Hancock, emphasizing that it "should never have happened" [106461]. The error was caused by the way data files were handled in Microsoft Excel, exceeding the maximum size and leading to cases being left out of the daily case figures. This accidental oversight in handling the data and using an outdated file format inadvertently resulted in the failure to report crucial Covid test results, putting lives at risk and impacting the government's assessment of the epidemic. |
Duration |
permanent |
(a) The software failure incident related to the Excel data files exceeding the maximum size, leading to nearly 16,000 Covid-19 cases going unreported in England, can be considered a temporary failure. This failure was caused by the way Public Health England (PHE) brought together logs produced by commercial firms into Excel templates using an old file format (XLS), which limited each template to about 65,000 rows of data instead of the one million-plus rows Excel is capable of handling. As a result, further cases were simply left off once the template reached its limit [106461].
(b) On the other hand, the limitations of Microsoft Excel in handling large datasets, such as the million-row limit, can be seen as a permanent contributing factor to software failures. The incident where a lab's daily test report in CSV format was loaded into Excel, causing the bottom rows to get cut off once the file exceeded Excel's row limit, showcases how the software's inherent limitations can lead to failures in data processing and reporting [106096]. |
Behaviour |
crash, omission, value, other |
(a) crash: The software failure incident in the articles can be categorized as a crash. The incident involving Microsoft Excel led to a significant data error where positive Covid-19 test results were left off the official daily figures due to a million-row limit in Excel. This crash resulted in potentially infectious people not being informed to self-isolate, indicating a failure of the system to perform its intended functions [106096, 106461].
(b) omission: The software failure incident can also be categorized as an omission. The error in the Excel files caused nearly 16,000 Covid-19 cases to go unreported in England, leading to the omission of these cases from the UK daily case figures. This omission meant that close contacts of those who tested positive were not traced, putting lives at risk [106461].
(c) timing: The software failure incident does not align with a timing failure. The issue was not related to the system performing its intended functions too late or too early; rather, it was a result of the system failing to handle the data correctly due to a limitation in the software [106096, 106461].
(d) value: The software failure incident can be associated with a value failure. The incident involved the system incorrectly handling the data due to the limitation of the Excel software, which led to positive test results being left out of the official daily figures. This incorrect handling of data resulted in a significant impact on contact tracing and public health measures [106096, 106461].
(e) byzantine: The software failure incident does not align with a byzantine failure. There were no indications of inconsistent responses or interactions from the system; instead, the failure was primarily due to the system reaching its limitations and not being able to process the data correctly [106096, 106461].
(f) other: The other behavior of the software failure incident could be described as a limitation failure. The incident was caused by the limitation of the Excel software in handling a large volume of data, leading to crucial Covid-19 test results being omitted from official reports. This limitation in the software's capacity resulted in a significant impact on public health data reporting and contact tracing efforts [106096, 106461]. |