Incident: Smart Motorway Software Failure Impacts Lane Closure Functionality and Safety

Published Date: 2022-12-12

Postmortem Analysis
Timeline 1. The software failure incident happened on 26 October [136656].
System 1. Smart motorway network software controlling the ability to set red X overhead signs [136656]
Responsible Organization 1. National Highways [136656]
Impacted Organization 1. Drivers using the smart motorway network in England were impacted by the software failure incident [136656].
Software Causes 1. The software failure incident on England's smart motorway network on 26 October was caused by the inability to set red X (overhead signs) due to unexpected faults during system reactivation [136656].
Non-software Causes 1. The fault was due to the inability of the smart motorway network to close lanes for almost four hours, affecting most of England's smart motorway network [Article 136656].
Impacts 1. The software failure incident left most of England's smart motorway network without the ability to close lanes for almost four hours, potentially putting drivers who had to stop on the roads at risk of being hit [136656].
Preventions 1. Implementing more robust testing procedures to catch unexpected faults during system upgrades could have prevented the software failure incident [136656]. 2. Regularly conducting maintenance and upgrades during off-peak hours to minimize the impact on operations and reduce the likelihood of unexpected faults during critical times could have helped prevent the incident [136656].
Fixes 1. Implementing thorough testing procedures to identify and address software bugs and faults before deployment [136656]. 2. Conducting regular maintenance and upgrades to ensure the software controlling the smart motorways is functioning properly [136656]. 3. Enhancing the redundancy and failover mechanisms in the software system to prevent widespread failures like the one experienced on 26 October [136656].
References 1. National Highways operations director Duncan Smith - provided information on the software failure incident and the measures taken to address it [136656].

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident related to smart motorways in England, specifically the inability to close lanes due to a software failure, has happened before within the same organization, National Highways. The article mentions that system failures have occurred in the past, causing issues with the software controlling the signs on smart motorways. National Highways admitted that the system has had to be switched off for essential upgrades, indicating previous instances of software-related problems within the organization [136656]. (b) The software failure incident related to smart motorways in England is not explicitly mentioned to have happened at other organizations or with their products and services in the provided article. Therefore, there is no information to suggest similar incidents occurring at multiple organizations.
Phase (Design/Operation) design, operation (a) The software failure incident in the smart motorway network was attributed to system development and updates. National Highways admitted that system failures have occurred due to the software not working properly to control the signs, requiring the system to be switched off for essential upgrades. The operations director mentioned that unexpected faults were experienced when bringing the system back up after the initial failure, leading to a longer downtime of about four hours [136656]. (b) The software failure incident also had implications related to operation. During the period of the failure, the ability to set red X (overhead signs) was switched off, posing potential risks to drivers who had to stop on the roads. To mitigate the impact, extra traffic officer patrols were deployed on the network. The operations director mentioned that the system failures led to the need for other mitigating measures during the incident [136656].
Boundary (Internal/External) within_system (a) The software failure incident mentioned in the article was within the system. The article states that the software failure affected the smart motorway network's ability to close lanes for almost four hours, specifically mentioning that the ability to set red X overhead signs was switched off during the incident [136656]. The operations director also mentioned that unexpected faults were experienced when bringing the system back up, indicating an internal system issue. (b) The software failure incident was not attributed to factors originating from outside the system in the articles provided.
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in the smart motorway network was primarily due to non-human actions. The fault that occurred between 17:45 and 21:30 GMT on 26 October affected the ability to close lanes on most of England's smart motorway network. This issue was attributed to unexpected faults during the process of bringing the system back up after the initial problem, leading to a longer downtime of about four hours [136656]. (b) Human actions were also involved in managing the software failure incident. To mitigate the impact of the software failure, extra traffic officer patrols were deployed on the network during the period when the ability to set red X overhead signs was switched off. Additionally, the operations director mentioned that the system had to be switched off for essential upgrades, which are done during quiet times without informing drivers [136656].
Dimension (Hardware/Software) software (a) The software failure incident mentioned in Article 136656 was primarily due to software issues. The article specifically states that the fault in the smart motorway network, which prevented the ability to close lanes for almost four hours, was a "software failure" [136656]. The operations director mentioned that during the incident, their ability to set red X overhead signs was switched off, indicating a software-related issue. Additionally, when bringing the system back up, unexpected faults were experienced, extending the downtime to four hours, further highlighting software-related problems. The article does not mention any hardware-related contributing factors to the software failure incident.
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident described in Article 136656 was non-malicious. The incident was attributed to a fault in the software that affected the smart motorway network's ability to close lanes for almost four hours. National Highways' operations director mentioned that the system failures were unexpected and that the software had to be switched off for essential upgrades. There is no indication in the article that the failure was caused by malicious intent or any deliberate actions to harm the system [136656].
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident related to the smart motorways was not explicitly attributed to poor decisions. However, it was mentioned that the system failures have meant the software to control the signs has not worked properly, and the system also has to be switched off for essential upgrades. This indicates that there may have been decisions made regarding the software implementation or maintenance that contributed to the failure [136656]. (b) The software failure incident on the smart motorways was described as an unexpected fault that occurred when bringing the system back up after the initial issue. This suggests that the failure was more of an accidental occurrence rather than a deliberate decision [136656].
Capability (Incompetence/Accidental) accidental (a) The software failure incident mentioned in the article was not explicitly attributed to development incompetence. The article primarily focused on a software failure that occurred on England's smart motorway network, affecting the ability to close lanes for almost four hours. The operations director mentioned unexpected faults during the process of bringing the system back up, leading to a longer downtime than anticipated. However, there was no direct indication of development incompetence as the cause of the software failure [136656]. (b) The software failure incident described in the article was more aligned with an accidental failure. The article highlighted that the software failure, which lasted for about four hours, was not an ideal situation and affected the smart motorway network's operations. The operations director mentioned that unexpected faults occurred during the process of bringing the system back up, leading to the extended downtime. This suggests that the software failure was more accidental in nature rather than due to development incompetence [136656].
Duration temporary (a) The software failure incident described in Article 136656 was temporary. The incident lasted for almost four hours, from 17:45 to 21:30 GMT on 26 October. During this time, the ability to set red X overhead signs was switched off, leading to potential risks for drivers on the smart motorways. Additionally, when the system was being brought back up, unexpected faults were experienced, extending the duration of the incident [136656].
Behaviour crash, omission, value, other (a) crash: The software failure incident in the smart motorway network resulted in the system losing the ability to set red X overhead signs, which was a critical function for lane closures. This loss of functionality led to potential risks for drivers on the roads [136656]. (b) omission: The software failure incident involved the system omitting to perform its intended function of setting red X overhead signs to close lanes for almost four hours, impacting the safety and operations of the smart motorway network [136656]. (c) timing: The software failure incident caused delays in restoring the system functionality, extending the expected downtime from two hours to more like four hours. This timing issue was due to unexpected faults encountered during the system restoration process [136656]. (d) value: The software failure incident led to the system performing its intended function of setting red X overhead signs incorrectly, as it was unable to do so during the period of the failure. This incorrect performance affected the safety and traffic management on the smart motorway network [136656]. (e) byzantine: There is no specific mention of the software failure incident exhibiting inconsistent responses or interactions that would classify it as a byzantine behavior in the provided article. (f) other: The software failure incident also required the system to be switched off for essential upgrades, indicating a planned maintenance aspect to the failure scenario. This additional aspect of the incident could be categorized as a planned downtime or maintenance-related behavior [136656].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence delay, other (a) death: People lost their lives due to the software failure - There is no mention of any deaths resulting from the software failure incident on the smart motorways [136656]. (b) harm: People were physically harmed due to the software failure - The article does not mention any physical harm caused to individuals as a direct result of the software failure incident on the smart motorways [136656]. (c) basic: People's access to food or shelter was impacted because of the software failure - The software failure incident on the smart motorways did not impact people's access to food or shelter as per the information provided [136656]. (d) property: People's material goods, money, or data was impacted due to the software failure - There is no indication in the article that people's material goods, money, or data were impacted by the software failure incident on the smart motorways [136656]. (e) delay: People had to postpone an activity due to the software failure - The software failure incident on the smart motorways did lead to delays for drivers who had to stop on the roads due to the inability to close lanes, potentially putting them at risk of being hit [136656]. (f) non-human: Non-human entities were impacted due to the software failure - The software failure incident primarily affected the operation of the smart motorways and the ability to close lanes, with no mention of non-human entities being impacted [136656]. (g) no_consequence: There were no real observed consequences of the software failure - The software failure incident did have consequences, such as the inability to close lanes on the smart motorways, leading to potential risks for drivers who had to stop on the roads [136656]. (h) theoretical_consequence: There were potential consequences discussed of the software failure that did not occur - The article does not mention any potential consequences discussed that did not actually occur as a result of the software failure incident on the smart motorways [136656]. (i) other: Was there consequence(s) of the software failure not described in the (a to h) options? What is the other consequence(s)? - The primary consequence of the software failure incident on the smart motorways was the potential risk to drivers who had to stop on the roads due to the inability to close lanes, as mentioned in the article [136656].
Domain transportation The software failure incident reported in Article 136656 is related to the transportation industry. The smart motorway network in England, which experienced the software failure, is a part of the transportation infrastructure designed to improve traffic flow and safety on motorways by utilizing technology to manage lanes and incidents effectively. The incident affected the smart motorway network's ability to close lanes for almost four hours, potentially putting drivers at risk [136656]. The National Highways operations director mentioned that the smart motorways provide a level of assurance that other roads don't, indicating the importance of this technology in the transportation sector [136656]. The system failures in the smart motorway software have led to concerns about safety and the need for essential upgrades to ensure the proper functioning of the transportation infrastructure [136656].

Sources

Back to List