Incident: Hitachi Class 800 Trains Experience Technical Faults and Delays

Published Date: 2017-10-19

Postmortem Analysis
Timeline 1. The software failure incident with the Hitachi Class 800 trains occurred on Monday, as mentioned in the article [64649]. 2. The article was published on October 19, 2017. 3. Estimating the timeline: - Step 1: The incident happened on Monday, which was the first day of the week. - Step 2: The article was published on October 19, 2017. - Step 3: The incident likely occurred on October 16, 2017. Therefore, the software failure incident with the Hitachi Class 800 trains happened on October 16, 2017.
System The software failure incident mentioned in the article involved technical problems with the Hitachi Class 800 trains operated by Great Western Railway. The specific systems/components/models that failed include: 1. IT router 2. Air valve 3. Computer system software on the trains These components experienced technical faults and required updates to resolve the issues, leading to the temporary withdrawal of the trains from passenger service [64649].
Responsible Organization 1. Hitachi, the manufacturer of the trains, was responsible for causing the software failure incident [64649].
Impacted Organization 1. Great Western Railway [64649] 2. Hitachi, the manufacturer of the trains [64649] 3. Passengers using the intercity express trains [64649]
Software Causes 1. The software causes of the failure incident were related to technical faults that emerged on the maiden journey of the Hitachi Class 800 trains, including problems with an IT router and computer system software updates [64649].
Non-software Causes 1. A rock fall near Bristol causing delays or cancellations for commuters traveling from the south-west of England to London [64649].
Impacts 1. The software failure incident led to the first two trains in the £5.7bn fleet not carrying passengers while technical faults were being worked on, causing delays and inconvenience to passengers [64649]. 2. Passengers experienced issues such as standing due to lack of seating, air-conditioning unit leaks, problems with an IT router, and an air valve malfunction [64649]. 3. The ongoing faults with the trains resulted in a lack of confidence in the train operator, GWR, to deliver a reliable service, leading to growing dissatisfaction among customers [64649].
Preventions 1. Implementing thorough software testing before deploying the trains into passenger service could have potentially prevented the software failure incident [64649]. 2. Conducting comprehensive software quality assurance checks to identify and address any technical issues prior to the trains being operational for passengers may have helped prevent the software failure incident [64649]. 3. Ensuring that the software updates for the computer system on the trains were rigorously tested and validated before implementing them in the fleet could have mitigated the occurrence of technical faults and software failures [64649].
Fixes 1. Updating the computer system software on the trains to address the initial issues observed [64649].
References 1. Great Western Railway (GWR) spokesperson [Article 64649] 2. Hitachi spokesperson [Article 64649]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident related to the intercity express train has happened again within the same organization. The article mentions that on Thursday, passengers were unamused to find more problems with the IETs, and a tweet from @GWRHelp explained that there had been a technical issue and Hitachi engineers were working to resolve it [64649]. This indicates that the software failure incident occurred again within the organization responsible for the train service. (b) There is no specific mention in the article of a similar software failure incident happening at other organizations or with their products and services.
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase can be seen in the article where it mentions technical faults that emerged on the maiden journey of the Hitachi Class 800 trains. Issues such as problems with an IT router and an air valve indicate potential design flaws or issues introduced during the system development phase [64649]. (b) The software failure incident related to the operation phase is evident in the article when it mentions that there were technical issues on Tuesday and Wednesday but nothing that stopped the trains from carrying passengers. This indicates that the failures were related to the operation or use of the system rather than inherent design flaws [64649].
Boundary (Internal/External) within_system (a) within_system: The software failure incident with the Hitachi Class 800 trains was primarily due to technical faults within the system. The article mentions problems with an IT router, air valve, and computer system software on the trains [64649]. These issues were being addressed by Hitachi engineers to resolve the matter and improve the passenger experience. The software updates and improvements made overnight were aimed at addressing the internal technical issues within the system.
Nature (Human/Non-human) non-human_actions (a) The software failure incident in the article seems to be related to non-human actions. The article mentions technical faults that emerged on the maiden journey of the Hitachi Class 800 trains, including problems with an IT router and an air valve [64649]. Additionally, the article states that Hitachi engineers were working to resolve the technical issue, indicating a technical glitch rather than a human error as the cause of the failure.
Dimension (Hardware/Software) hardware, software (a) The software failure incident related to hardware: - The article mentions technical faults on the Hitachi Class 800 trains, including problems with an IT router and an air valve [64649]. - Engineers were working on fixing the technical problems that emerged on the trains, indicating hardware-related issues [64649]. (b) The software failure incident related to software: - The article mentions that the computer system software on the trains has been updated at the depot to fix some initial issues seen on Monday [64649]. - Hitachi engineers were working on resolving a technical issue related to the software on the trains [64649].
Objective (Malicious/Non-malicious) non-malicious (a) The articles do not mention any malicious intent or actions related to the software failure incident. The issues with the new Hitachi Class 800 trains were attributed to technical faults, such as problems with an IT router, air valve, and computer system software updates [64649]. These issues were being addressed by engineers from Hitachi to improve the passenger experience and ensure the trains could be put back into service as soon as possible. There is no indication in the articles that the software failures were caused by malicious actions.
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident related to the intercity express train can be attributed to poor decisions. The incident was a result of technical faults that emerged on the maiden journey of the new trains. The article mentions that the government had claimed investment in the new trains showed it was putting passengers at the heart of its rail policy, but the problems with the trains, including issues with an IT router and an air valve, led to delays and dissatisfaction among passengers [64649].
Capability (Incompetence/Accidental) development_incompetence (a) The software failure incident in the article seems to be more related to development incompetence rather than accidental factors. The article mentions technical faults that emerged on the maiden journey of the new trains, including problems with an IT router and an air valve [64649]. Additionally, there were ongoing faults with the trains despite efforts to resolve them, leading to dissatisfaction among passengers and concerns about the reliability of the service [64649]. These issues point towards failures that could be attributed to development incompetence rather than accidental factors.
Duration temporary (a) permanent: The software failure incident in this case seems to be temporary rather than permanent. The article mentions that engineers were working on technical faults that emerged on the maiden journey of the new trains, and updates were being made to the computer system software on the trains to address the initial issues. Hitachi engineers were actively working to resolve the matter, and improvements were being tested overnight to enhance the passenger experience [64649]. (b) temporary: The software failure incident in this case is temporary. The article states that the new trains were temporarily taken out of passenger service while technical faults were being addressed. Hitachi engineers were working on resolving the technical issues, and updates were made to the computer system software on the trains to fix the initial problems. The trains were expected to be running again as soon as possible after improvements were tested [64649].
Behaviour value, other (a) crash: The software failure incident in this case did not involve a crash where the system lost state and did not perform any of its intended functions. The article mentions technical faults, problems with an IT router, an air valve, and a computer system software update to address the issues [64649]. (b) omission: There is no specific mention of the software failure incident omitting to perform its intended functions at an instance(s) in the article [64649]. (c) timing: The software failure incident did not involve timing issues where the system performed its intended functions too late or too early. The focus was more on technical faults and ongoing issues with the new trains [64649]. (d) value: The software failure incident did involve the system performing its intended functions incorrectly, as there were technical issues with the new trains, including problems with an air-conditioning unit, an IT router, and an air valve [64649]. (e) byzantine: The software failure incident did not exhibit behaviors of a byzantine failure where the system behaved erroneously with inconsistent responses and interactions. The issues mentioned were more related to technical faults and ongoing problems with the new trains [64649]. (f) other: The other behavior observed in this software failure incident was related to ongoing technical issues with the new trains, including the need for software updates, testing improvements, and addressing various faults to ensure the trains can operate effectively and provide a reliable service [64649].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence delay (e) delay: People had to postpone an activity due to the software failure - The software failure incident related to the Hitachi Class 800 trains caused delays and cancellations for commuters traveling from the south-west of England to London on Tuesday [64649].
Domain transportation (a) The failed system in this incident was related to the transportation industry. The software failure incident occurred on the Hitachi Class 800 trains operated by Great Western Railway [64649].

Sources

Back to List