Incident: New Year London Ambulance Delays Due to CAD System Failure

Published Date: 2017-06-27

Postmortem Analysis
Timeline 1. The software failure incident happened on New Year's Eve, as mentioned in the article [60158]. 2. The article was published on 2017-06-27. 3. Therefore, the software failure incident occurred on New Year's Eve of the previous year, which would be December 31, 2016.
System 1. Computer Aided Dispatch (CAD) system [60158] 2. Real-time, web-based mapping of ambulance crews [60158]
Responsible Organization 1. The London Ambulance Service's Computer Aided Dispatch (CAD) system failure was responsible for causing the software failure incident [60158].
Impacted Organization 1. Patients in need of emergency medical assistance in London [60158] 2. London Ambulance Service (LAS) [60158]
Software Causes 1. Lack of regular system maintenance and updates on the Computer Aided Dispatch (CAD) system, with no maintenance occurring between June 2015 and February 2017 despite six software updates during that time period [60158]. 2. Insufficient awareness and action by the LAS board regarding technical resilience risks to the CAD system, including the failure to conduct scheduled testing and maintenance, leading to the system breakdown on New Year's Eve [60158]. 3. Inadequate preparation for IT failures, as evidenced by the lack of full-scale simulation of an IT failure for nearly two years prior to the incident [60158].
Non-software Causes 1. Lack of system maintenance on the CAD system between June 2015 and February 2017 despite six software updates during that time period [60158]. 2. Insufficient awareness of technical resilience risks to the CAD system by the LAS board [60158]. 3. Failure to conduct scheduled testing and maintenance regularly on the CAD system [60158]. 4. Lack of full-scale simulation of an IT failure for nearly two years [60158].
Impacts 1. Some people in London had to wait up to seven hours for an ambulance on New Year's Eve due to IT systems failure, leading to delays in medical treatment [60158]. 2. Emergency calls were diverted to Scotland as London call handlers became overwhelmed, with one call not answered for nearly 25 minutes, potentially affecting the timely response to critical situations [60158]. 3. Calls had to be recorded by pen and paper for nearly five hours, causing potential delays and inefficiencies in communication and dispatching of ambulances [60158]. 4. One person died, possibly as a result of treatment being delayed due to the software failure incident [60158]. 5. The real-time, web-based mapping of ambulance crews failed, forcing staff to rely on radio communications only, which could have impacted the efficiency of dispatching and locating ambulances [60158].
Preventions 1. Regular scheduled testing and maintenance of the Computer Aided Dispatch (CAD) system could have prevented the software failure incident [60158]. 2. Implementation of system maintenance on the CAD system between June 2015 and February 2017, especially considering the six software updates during that time, could have helped prevent the incident [60158]. 3. Conducting full-scale simulations of IT failures more frequently, rather than having the last one nearly two years prior, could have identified vulnerabilities and prevented the incident [60158].
Fixes 1. Regular scheduled testing and maintenance of the Computer Aided Dispatch (CAD) system to ensure technical resilience and prevent failures [60158]. 2. Implementation of system maintenance on the CAD system to keep it up to date and prevent issues arising from lack of updates [60158]. 3. Conducting full-scale simulations of IT failures to identify weaknesses and improve response strategies [60158].
References 1. Internal investigation by the London Ambulance Service (LAS) [Article 60158] 2. Clinical safety review conducted by the LAS [Article 60158]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident at the London Ambulance Service (LAS) was not an isolated incident. The internal investigation found that there had been two other recorded Computer Aided Dispatch (CAD) failures in the preceding 12 months, raising questions about the quality of the overall resilience of the CAD system [60158]. (b) The review of the software failure incident at the LAS also identified that no system maintenance occurred on the CAD system between June 2015 and February 2017, during which time there had been six software updates. Additionally, the review found little evidence that a full-scale simulation of an IT failure had been carried out for nearly two years, indicating a lack of proactive measures to prevent such incidents [60158].
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase is evident in the article. The internal investigation into the New Year's Eve ambulance delays in London found that the Computer Aided Dispatch (CAD) system failure was not an isolated incident. The review identified that there had been two other recorded CAD failures in the preceding 12 months, raising questions about the quality of the overall resilience of the CAD system [60158]. The review also highlighted that the LAS board was not sufficiently aware of the technical resilience risks to the CAD system, and scheduled testing and maintenance were not regularly performed. It was noted that no system maintenance occurred on the CAD system between June 2015 and February 2017, despite there being six software updates during that time. Additionally, there was little evidence that a full-scale simulation of an IT failure had been carried out for nearly two years, indicating a lack of proactive measures in the system's design phase [60158]. (b) The software failure incident related to the operation phase is also evident in the article. During the New Year's Eve incident, some emergency calls had to be diverted to call centers outside of London, including Scotland, as London call handlers became overwhelmed. This diversion of calls due to the system failure impacted the operation of the ambulance service, leading to delays in response times and potential consequences for patients [60158].
Boundary (Internal/External) within_system (a) within_system: The software failure incident involving the London Ambulance Service on New Year's Eve was primarily attributed to internal factors within the system. The internal investigation found that the Computer Aided Dispatch (CAD) system failed, leading to significant delays in ambulance responses and emergency call handling [60158]. The review highlighted issues such as lack of system maintenance, inadequate awareness of technical resilience risks by the LAS board, and the absence of scheduled testing and maintenance for the CAD system. Additionally, the report mentioned that there were previous CAD failures in the preceding 12 months, indicating a potential lack of overall system resilience [60158]. These factors point to deficiencies within the system itself that contributed to the software failure incident.
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident was primarily due to non-human_actions, specifically an IT fault in the Computer Aided Dispatch (CAD) system. The internal investigation found that the CAD system failed, leading to delays in ambulance responses on New Year's Eve. The review highlighted that the CAD system failure was not an isolated incident and had occurred previously, indicating underlying technical resilience risks and lack of system maintenance [60158]. (b) However, human_actions also played a role in the failure. The review pointed out that the LAS board was not sufficiently aware of the technical resilience risks to the CAD system, and scheduled testing and maintenance were not regularly performed. Additionally, despite identifying risks to the CAD system, no actions were taken to address them before the New Year IT failure. Lack of system maintenance and updates, as well as the absence of full-scale simulations of IT failures, were also attributed to human negligence [60158].
Dimension (Hardware/Software) hardware, software (a) The software failure incident related to hardware: - The review found that there was no system maintenance on the CAD system between June 2015 and February 2017, during which time there had been six software updates [60158]. - The review also identified seven risks to the CAD system, but no actions were taken to address them before the New Year IT failure [60158]. (b) The software failure incident related to software: - The Computer Aided Dispatch (CAD) system failure was a key factor in the incident, with calls having to be recorded by pen and paper for nearly five hours due to the system failure [60158]. - The review highlighted that there had been two other recorded CAD failures in the preceding 12 months, indicating issues with the quality of the overall resilience of the CAD system [60158].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident described in the article was non-malicious. The failure was attributed to an IT fault within the London Ambulance Service's Computer Aided Dispatch (CAD) system, which led to delays in ambulance responses on New Year's Eve. The internal investigation found that the failure was not due to malicious intent but rather to technical issues and lack of system maintenance [60158].
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident involving the London Ambulance Service on New Year's Eve was primarily attributed to poor decisions and lack of proper maintenance practices. The internal investigation revealed that there were multiple failures of the Computer Aided Dispatch (CAD) system in the preceding months, indicating a lack of attention to the system's resilience and technical risks by the LAS board [60158]. The review highlighted that scheduled testing and maintenance were not regularly performed, and no system maintenance occurred on the CAD system for a significant period despite multiple software updates [60158]. Additionally, there was little evidence of full-scale simulations of IT failures being conducted for nearly two years, further indicating a lack of proactive measures to address potential system vulnerabilities [60158]. These findings suggest that the software failure incident was largely a result of poor decisions and inadequate maintenance practices within the organization.
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident was partly attributed to development incompetence. The internal investigation by the London Ambulance Service found that there were two other recorded CAD failures in the preceding 12 months, indicating a lack of quality in the overall resilience of the CAD system [60158]. The review also highlighted that the LAS board was not sufficiently aware of the technical resilience risks to the CAD system, and scheduled testing and maintenance were not regularly performed. Additionally, no system maintenance occurred on the CAD system for a significant period, despite multiple software updates during that time [60158]. (b) The software failure incident was also influenced by accidental factors. The review revealed that there were seven risks to the CAD system that were identified but not addressed before the New Year IT failure occurred [60158]. Furthermore, there was little evidence that a full-scale simulation of an IT failure had been conducted for nearly two years, indicating a lack of proactive measures to prevent such incidents [60158].
Duration temporary The software failure incident at the London Ambulance Service on New Year's Eve was temporary. The incident was caused by IT systems failing, leading to delays in ambulance responses and necessitating calls to be recorded by pen and paper for nearly five hours [60158]. The failure was not permanent as it was eventually fixed, and measures were taken to prevent a similar incident from happening again.
Behaviour crash, omission, other (a) crash: The software failure incident in the London Ambulance Service on New Year's Eve was a crash as the Computer Aided Dispatch (CAD) system failed, leading to a situation where emergency calls had to be recorded by pen and paper for nearly five hours, calls were diverted to Scotland, and the real-time, web-based mapping of ambulance crews failed [60158]. (b) omission: The software failure incident also involved omission as the CAD system failed to perform its intended functions, resulting in delays in dispatching ambulances and handling emergency calls, with some calls not being answered for nearly 25 minutes [60158]. (c) timing: The timing of the software failure incident was also a factor as it occurred on New Year's Eve, one of the busiest nights of the year for the London Ambulance Service, leading to delays in providing medical help to patients who had to wait longer than usual for assistance [60158]. (d) value: The software failure incident did not involve the system performing its intended functions incorrectly in terms of providing medical treatment or dispatching ambulances. However, the delay caused by the CAD system failure may have had an impact on the treatment of a patient who died, possibly as a result of treatment being delayed [60158]. (e) byzantine: The software failure incident did not exhibit behavior indicative of a byzantine failure where the system behaves erroneously with inconsistent responses and interactions. The primary issue was the failure of the CAD system leading to delays and disruptions in the ambulance service operations [60158]. (f) other: The other behavior observed in this software failure incident was the lack of regular system maintenance and testing, as well as the failure to address identified risks to the CAD system despite previous incidents of CAD failures in the preceding 12 months. This lack of proactive maintenance and risk mitigation contributed to the crash and omission aspects of the failure incident [60158].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence death, harm, delay (a) death: One person died, possibly as a result of treatment being delayed due to the IT systems failure on New Year's Eve [60158].
Domain transportation, health (a) The failed system was intended to support the health industry, specifically the London Ambulance Service. The Computer Aided Dispatch (CAD) system failure impacted the ambulance service's ability to respond to emergency calls during the New Year's Eve event, leading to delays in patient treatment and potentially contributing to a patient's death [Article 60158].

Sources

Back to List