Incident: Smart Motorway System Technology Failure Impacts Operations and Safety

Published Date: 2022-11-16

Postmortem Analysis
Timeline 1. The software failure incident with the smart motorway systems occurred in September and October [135160]. Therefore, the software failure incident happened in September and October.
System 1. Cameras and radar alert system 2. Software to control the signs 3. IT system meant to alert to disasters
Responsible Organization 1. National Highways [135160]
Impacted Organization 1. Drivers using smart motorways [135160] 2. National Highways (responsible for the smart motorway system) [135160]
Software Causes 1. Computer failures affecting the software to control the signs, leading to improper functioning [135160] 2. System technology failures resulting in the smart motorway technology not working for significant periods of time [135160]
Non-software Causes 1. Lack of cameras and radar functioning properly to alert the control room to vehicles in live lanes [135160]. 2. Issues with the software controlling the signs used to inform drivers about incidents on smart motorways [135160].
Impacts 1. The software failure incident led to the smart motorway technology not working for 21 hours in September and experiencing outages during operational hours in both September and October [135160]. 2. The failure of the software controlling the signs meant that lanes could not be closed using the signs upstream of incidents, such as live-lane breakdowns, potentially increasing the risk of accidents and disruptions on the smart motorways [135160]. 3. The failure caused concerns among drivers and campaigners, such as Claire Mercer, who received messages outlining system failures, leading to feelings of terror and helplessness [135160]. 4. The MP for Rotherham Central, Sarah Champion, expressed concerns that drivers were essentially playing "roulette" every time they drove on the smart motorways due to the IT failures, highlighting the potential dangers and risks associated with the software failure incident [135160].
Preventions 1. Regular maintenance and testing of the smart motorway technology to ensure its proper functioning [135160]. 2. Implementing redundancy measures in the software system to prevent complete failures in case of glitches or bugs [135160]. 3. Enhancing the monitoring and alert systems for detecting software failures promptly to minimize downtime and potential risks [135160].
Fixes 1. Implementing a thorough and comprehensive independent investigation to identify the root causes of the software failures in the smart motorway system [135160]. 2. Enhancing the software controlling the signs on smart motorways to ensure proper functionality and reliability, especially during critical situations like live-lane breakdowns [135160]. 3. Strengthening the IT infrastructure supporting the smart motorway technology to prevent future outages and ensure continuous monitoring and alerting capabilities [135160].
References 1. National Highways 2. Claire Mercer 3. Sarah Champion, Labour MP for Rotherham Central 4. Duncan Smith, Executive Director for Operations at National Highways [135160]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident related to smart motorways has happened again within the same organization, National Highways. The incident involved the technology behind smart motorway systems not working for extended periods, with one outage lasting nearly 4 hours in October. Claire Mercer, a campaigner against smart motorways, received messages from National Highways staff outlining system failures, indicating a recurring issue within the organization [135160]. (b) The software failure incident related to smart motorways has also occurred at other organizations or with their products and services. Sarah Champion, Labour MP for Rotherham Central, mentioned that the IT systems on smart motorways were not working as intended, leading to potential safety risks for drivers. This suggests that similar issues may exist in the technology systems of other organizations responsible for smart motorways or similar infrastructure [135160].
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase is evident in the article. It mentions that computer failures have meant the software to control the signs on smart motorways has not worked properly. Specifically, there were instances where the system technology behind smart motorways did not work for significant durations, such as a 21-hour outage in September and another outage lasting nearly 4 hours in October. These failures were attributed to issues with the software controlling the signs, indicating a design-related failure introduced during system development or updates [135160]. (b) The software failure incident related to the operation phase is also highlighted in the article. Concerns were raised about the operational reliability of the smart motorway system, with reports of the technology not functioning for a significant percentage of operational hours in September. This operational failure was evident in messages received by Claire Mercer from staff working for National Highways, outlining system failures and indicating that the equipment meant to alert drivers to disasters was not working as intended. These operational challenges point to issues introduced during the operation or maintenance of the system [135160].
Boundary (Internal/External) within_system (a) The software failure incident related to the smart motorway system can be categorized as within_system. The article mentions that computer failures have meant the software to control the signs on smart motorways has not worked properly, leading to issues such as the system not functioning for a significant percentage of operational hours in September and experiencing outages in October [135160]. These failures are internal to the system and are related to the software's functionality and performance.
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in the smart motorway system was primarily due to non-human actions. The failure was attributed to computer failures that affected the software controlling the signs on the smart motorways, leading to issues with cameras and radar alerts not functioning properly [135160]. The outage in October lasting nearly 4 hours was a result of these non-human factors impacting the system's functionality. (b) Human actions also played a role in the incident as highlighted by Sarah Champion, Labour MP for Rotherham Central, who mentioned that drivers were playing "roulette" every time they drove on the smart motorways due to the IT failures. She emphasized the importance of the IT systems to alert drivers to disasters, indicating that human decisions to implement and rely on these systems were contributing factors to the potential risks faced by drivers [135160].
Dimension (Hardware/Software) hardware, software (a) The software failure incident occurring due to hardware: - The article mentions that computer failures have meant the software to control the signs on smart motorways has not worked properly, indicating a hardware-related issue [135160]. (b) The software failure incident occurring due to software: - The article specifically states that the software to control the signs on smart motorways has not worked properly due to computer failures, highlighting a software-related issue [135160].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident related to the smart motorway system was non-malicious. The failure was attributed to computer failures that affected the software controlling the signs used to manage the smart motorways. National Highways acknowledged the failures and mentioned that safety remained their number one priority, indicating that the incident was not caused by malicious intent [135160].
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident related to the smart motorway system technology can be attributed to poor decisions made in the design and implementation of the system. The incident involved failures in the software controlling the signs used to alert drivers about issues on the smart motorways. The system was not functioning for a significant percentage of operational hours, with one outage lasting nearly 4 hours in October [135160]. This indicates that there were shortcomings in the decision-making process regarding the technology and software used in the smart motorway system, leading to operational failures and potential safety risks for drivers.
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident related to development incompetence is evident in the article. The article mentions that computer failures have meant the software to control the signs on smart motorways has not worked properly, leading to incidents where the system was not functioning for a significant amount of time [135160]. (b) The software failure incident related to accidental factors is also highlighted in the article. It is reported that the technology behind smart motorway systems did not work for 21 hours during September, with an outage in October lasting nearly 4 hours. These incidents indicate failures that were not intentional but occurred due to technical issues or glitches [135160].
Duration temporary From the provided article [135160], it is evident that the software failure incident related to the smart motorway system was temporary rather than permanent. The article mentions that the technology behind smart motorway systems did not work for 21 hours during September, with an outage in October lasting nearly 4 hours. These specific timeframes indicate that the software failure was temporary and not a permanent issue. Temporary failures are those that occur due to contributing factors introduced by certain circumstances but not all, and in this case, the system was able to function for the majority of the time, indicating a temporary nature of the failure.
Behaviour crash (a) crash: The software failure incident in the smart motorway system can be categorized as a crash. The article mentions that "Computer failures have meant the software to control the signs has not worked properly" and there were instances where the system was not functioning for a significant amount of time, such as a 21-hour outage in September and a nearly 4-hour outage in October [135160].

IoT System Layer

Layer Option Rationale
Perception sensor [a] The software failure incident related to the perception layer of the cyber physical system that failed was primarily due to sensor errors. The article mentions that cameras and radar alert a control room to vehicles which have stopped in live lanes, but the system was not functioning for 3% of operational hours in September and had an outage in October lasting nearly 4 hours due to computer failures affecting the sensors [135160].
Communication unknown The software failure incident related to the smart motorway system technology did not specifically mention whether the failure was related to the communication layer of the cyber physical system that failed. The articles did not provide detailed technical information about the specific layer of the system that experienced the failure. Therefore, it is unknown whether the failure was at the link_level or connectivity_level.
Application FALSE The software failure incident related to the smart motorway system technology failure, as reported in Article 135160, was not explicitly attributed to the application layer of the cyber-physical system. The article primarily mentions failures in the technology controlling the signs and cameras, which are essential components of the smart motorway system. The failures were related to the software controlling the signs not working properly, leading to issues in alerting drivers about incidents on the road. However, the specific mention of the failure being related to bugs, operating system errors, unhandled exceptions, or incorrect usage, which are typical of application layer failures, is not provided in the article. Therefore, it is unknown whether the failure was specifically related to the application layer based on the information available in the article.

Other Details

Category Option Rationale
Consequence death, delay, theoretical_consequence, other (a) death: People lost their lives due to the software failure - Claire Mercer's husband died on a stretch of motorway near Sheffield in 2019, which led her to campaign against the use of smart motorways [135160]. (b) harm: People were physically harmed due to the software failure - There is no direct mention of physical harm caused by the software failure incident in the provided article [135160]. (c) basic: People's access to food or shelter was impacted because of the software failure - There is no mention of people's access to food or shelter being impacted by the software failure incident in the provided article [135160]. (d) property: People's material goods, money, or data was impacted due to the software failure - There is no direct mention of people's material goods, money, or data being impacted by the software failure incident in the provided article [135160]. (e) delay: People had to postpone an activity due to the software failure - Drivers on smart motorways may have experienced delays or disruptions due to the software failure incident, as the system was not functioning for a significant amount of time [135160]. (f) non-human: Non-human entities were impacted due to the software failure - The software failure incident primarily affected the functionality of smart motorway systems and the ability to manage traffic incidents effectively, but there is no specific mention of non-human entities being impacted [135160]. (g) no_consequence: There were no real observed consequences of the software failure - The software failure incident led to disruptions in the smart motorway systems, including the failure of cameras and radar to alert the control room to vehicles in live lanes, which could potentially lead to safety risks for drivers [135160]. (h) theoretical_consequence: There were potential consequences discussed of the software failure that did not occur - The potential consequences discussed include the risks associated with the smart motorway system not functioning properly, such as the inability to alert drivers to incidents and the safety implications of all-lane running smart motorways [135160]. (i) other: Was there consequence(s) of the software failure not described in the (a to h) options? What is the other consequence(s)? - The emotional impact on individuals, such as Claire Mercer, who received concerning messages about system failures and expressed fear and helplessness, could be considered as another consequence of the software failure incident [135160].
Domain transportation (a) The failed system was intended to support the transportation industry. The smart motorway system, which includes cameras, radar, and overhead signs to manage traffic flow and alert drivers to incidents, is a part of the transportation infrastructure on motorways in England [135160].

Sources

Back to List