Incident: British Airways' New Check-In System Failure Causing Chaos and Delays

Published Date: 2016-07-18

Postmortem Analysis
Timeline 1. The software failure incident involving British Airways' new check-in system happened in July 2016. [Article 45678]
System 1. British Airways' new check-in computer system 'FLY' [45678] 2. The new worldwide check-in system introduced by British Airways [45678]
Responsible Organization 1. British Airways introduced its new IT system, FLY, which was reported to have crashed 'all the time' and caused multiple system failures [45678]. 2. British Airways had outsourcing deals for IT systems, including one with Tata Consultancy Services (TCS) [45678].
Impacted Organization 1. British Airways [45678]
Software Causes 1. The new British Airways check-in system 'FLY' crashed frequently, causing delays and system failures [45678]. 2. The software had a 'doughnut of doom' issue that appeared when staff tried to work through the system, leading to queues and potential crashes [45678]. 3. The system failed to allocate seats properly for a couple heading to Japan for their wedding due to a glitch [45678]. 4. The new check-in software caused stress and health issues for staff, with some reduced to tears by regular glitches [45678]. 5. The system was not equipped to handle the tasks it was meant for, leading to long delays and insufficient training for staff [45678].
Non-software Causes 1. Insufficient training for staff, leading to poor performance and stress [45678] 2. Outsourcing deals for IT systems, including one with Tata Consultancy Services (TCS) [45678] 3. Cost-cutting measures, including job cuts in the IT department [45678]
Impacts 1. Long queues and chaos at Heathrow and Gatwick airports, causing delays and missed flights for passengers, including TV star Phillip Schofield [Article 45678]. 2. Check-in staff reduced to tears by regular glitches and stress caused by the new check-in system, affecting their health and well-being [Article 45678]. 3. Passengers expressing frustration and anger on social media platforms like Twitter due to the poor service and disorganization caused by the IT failures [Article 45678]. 4. BA had to rebook people who missed flights and provide accommodation for stranded passengers [Article 45678].
Preventions 1. Proper testing and quality assurance of the new check-in system before its full implementation could have potentially prevented the software failure incident [45678]. 2. Adequate training for staff on how to use the new system effectively could have helped in avoiding glitches and delays [45678]. 3. Regular monitoring and maintenance of the software to address any issues promptly could have mitigated the risk of system failures [45678]. 4. Ensuring that the new IT system is robust enough to handle the volume of operations at busy airports like Heathrow could have prevented the chaos caused by the software failure incident [45678].
Fixes 1. Improve system robustness and stability to prevent crashes and glitches, especially during peak times like the summer holiday rush [45678]. 2. Enhance training for staff to effectively use the new check-in system and handle any issues that may arise [45678]. 3. Address staff concerns and feedback regarding the new system to ensure it meets their needs and reduces stress levels [45678]. 4. Implement better communication strategies for passengers during system failures to provide updates and assistance [45678]. 5. Conduct a thorough review and potential redesign of the check-in software to eliminate the 'doughnut of doom' issue and other problematic features [45678].
References 1. Phillip Schofield's social media posts 2. British Airways staff complaints 3. Union survey of 700 staff 4. The Sun article 5. British Airways spokesperson statement 6. Passengers' complaints on Twitter 7. GMB union claims 8. Information from LinkedIn profiles of Adrian Steel and Steve Harding 9. The Register article

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident has happened again at British Airways. The article mentions that this incident is the fourth system failure in the past month for British Airways [45678]. The new check-in system introduced by British Airways has faced multiple glitches and crashes, causing significant disruptions for passengers and staff. (b) The software failure incident has also happened at JFK Airport's Terminal 7 in New York, which is owned by British Airways. In May, chaos ensued at the terminal when the internet server went down, leading to long queues and staff having to write boarding passes by hand [45678]. This incident at JFK Airport's Terminal 7 is another example of a software failure affecting British Airways operations.
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase: The British Airways' new check-in system, named FLY, was introduced earlier in the year as an 'intuitive, legacy-replacing' system. However, it has been reported that the system crashes 'all the time' and has caused significant issues for both passengers and staff. The system is supposed to handle various aspects of passenger check-in, including seat allocation, baggage allowances, and document checks. Staff have complained about a 'doughnut of doom' appearing in the system, leading to queues and potential crashes. The introduction of the new system has been blamed for increased stress among staff, with many feeling that the system is not equipped to perform its intended tasks effectively. A survey of 700 staff showed that 91% believed the system was not up to the task, 94% experienced long delays, and 89% felt training was insufficient [45678]. (b) The software failure incident related to the operation phase: The operational issues with the new check-in system have led to significant disruptions for British Airways passengers. The system failures have resulted in long queues, missed flights, and frustrated customers. Passengers have reported chaos at airports, with delays, lack of staff assistance, and overall disorganization. The system failures have occurred on multiple occasions, causing inconvenience and stress for both passengers and staff. British Airways has acknowledged the short-lived issues with the new system during their busiest summer period and mentioned that their IT teams are working to enhance the system's performance [45678].
Boundary (Internal/External) within_system (a) The software failure incident related to British Airways' new check-in system, known as 'FLY', was primarily within the system. The article mentions that the new IT system crashes 'all the time' and has caused multiple system failures in the past month [45678]. The system is believed to handle various aspects of passenger check-in, such as seat allocation, baggage allowance, and document verification. Desk staff have complained about issues like the 'doughnut of doom' appearing in the system, leading to queues and potential crashes. Additionally, the introduction of the new system has been linked to increased stress among staff, with many reporting insufficient training and health issues due to the software problems. The incident also resulted in passengers missing flights, facing delays, and experiencing chaos at the airport terminals [45678].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident occurring due to non-human actions: The software failure incident involving British Airways' new check-in system, known as 'FLY,' was primarily attributed to technical issues and glitches within the system itself. The system was reported to crash frequently, causing delays, long queues, and operational disruptions at airports such as Heathrow and Gatwick [45678]. (b) The software failure incident occurring due to human actions: While the software failure was mainly due to technical issues within the system, there were also mentions of human-related factors contributing to the incident. For example, staff members expressed concerns about insufficient training, stress, and health issues resulting from the challenges posed by the new check-in system [45678]. Additionally, there were reports of outsourcing deals with Tata Consultancy Services (TCS) and cost-cutting measures involving job cuts within the IT department, which could have potentially impacted the system's performance [45678].
Dimension (Hardware/Software) software (a) The software failure incident occurring due to hardware: - The article does not specifically mention any hardware-related issues contributing to the software failure incident. Therefore, it is unknown if hardware played a role in the failure incident. (b) The software failure incident occurring due to software: - The software failure incident with British Airways' new check-in system, known as 'FLY,' was primarily attributed to software issues. The system was described as crashing 'all the time,' causing long delays, glitches, and stress among staff. The software was reported to have a 'doughnut of doom' issue, leading to queues and potential crashes. Additionally, the new system was blamed for bumping passengers off flights due to seat allocation glitches caused by the software. The software was also criticized for not being equipped to handle its intended tasks, leading to stress, delays, and health issues among staff [45678].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident related to British Airways' new check-in system, known as 'FLY', does not appear to be malicious. The failure was attributed to technical issues and system glitches rather than any intentional harm caused by individuals. The incident resulted in long queues, missed flights, and frustration among passengers and staff [45678]. (b) The non-malicious factors contributing to the software failure incident include issues with the new check-in system crashing frequently, inadequate training for staff, system delays, and overall system instability. The failure was primarily due to technical shortcomings and poor system performance rather than any deliberate actions to harm the system [45678].
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident related to the British Airways check-in system can be attributed to poor decisions. The new check-in system, named FLY, was introduced earlier in the year and has been plagued with issues since then. The system crashes frequently, causing delays, long queues, and frustration among both passengers and staff. Reports indicate that the system is not robust enough for an airport like Heathrow, leading to a "nightmare" scenario for both employees and travelers [45678]. Additionally, a union survey of 700 staff revealed that 91% of them believed the system was not equipped to perform its intended tasks, 94% experienced long delays, and 89% felt that training was insufficient. Furthermore, 94% thought management had not listened to staff concerns, and 76% reported that their health had suffered due to stress and anger from passengers [45678]. These factors point to poor decisions made in the implementation and management of the new check-in system, contributing to its failure.
Capability (Incompetence/Accidental) development_incompetence (a) The software failure incident related to development incompetence is evident in the case of British Airways' new check-in system 'FLY'. The system was introduced earlier in the year as an 'intuitive, legacy-replacing' system, but it has been plagued with issues. Reports indicate that the system crashes frequently, causing long delays, stress among staff, and even tears. A survey of 700 staff members revealed that 89% felt the training was poor, 94% experienced delays or system failures, and 76% reported health issues due to stress or anger from frustrated passengers [45678]. (b) The software failure incident related to accidental factors is seen in the chaos and disruptions caused by the new check-in system at British Airways. The system failures, crashes, and glitches were not intentional but rather accidental consequences of the system's inadequacies. Passengers faced long queues, missed flights, and overall disorganization due to the issues with the check-in software. The problems were not deliberate but arose from the system's inability to handle the workload and the stress it caused to both staff and passengers [45678].
Duration temporary (a) The software failure incident related to British Airways' new check-in system can be categorized as a temporary failure. The incident was described as a glitch that caused delays and system failures, leading to long queues and frustrated passengers [45678]. The article mentions that the new check-in system was introduced earlier in the year and has experienced multiple system failures in the past month, indicating that the failure was not permanent but rather a recurring issue. Additionally, the IT teams were reported to be working hard to enhance the system's overall performance, suggesting that the failure was not a permanent one but rather a temporary setback that the airline was actively addressing.
Behaviour crash, omission, timing, value, other (a) crash: The software failure incident involving British Airways' new check-in system was characterized by crashes. The system was reported to crash 'all the time' causing delays, long queues, and even leading to passengers missing their flights [45678]. (b) omission: The system failure incident also involved omissions where the system omitted to perform its intended functions at instances. Passengers reported that the check-in system stopped working, causing chaos at terminals, with long queues and delays as staff struggled to deal with the issues [45678]. (c) timing: There were instances of timing-related failures in the software incident. Passengers experienced delays at check-ins, missed flights, and faced chaos due to the system's performance issues, indicating that the system was not functioning correctly in terms of timing [45678]. (d) value: The software incident also involved failures related to the system performing its intended functions incorrectly. For example, the new check-in system failed to allocate seats properly for a couple heading to Japan for their wedding due to a glitch in the software [45678]. (e) byzantine: The software incident did not specifically mention behaviors indicative of a byzantine failure. (f) other: The software incident also led to staff being reduced to tears by regular glitches, with some even crying on their way to work due to the stress caused by the check-in software. Additionally, the system was described as causing a 'summer of holiday chaos' and being a 'nightmare' at an airport like Heathrow, indicating a significant negative impact beyond just crashes or omissions [45678].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence delay The consequence of the software failure incident described in the articles is primarily related to delays experienced by passengers due to the failure of British Airways' new check-in system. Passengers faced long queues, missed flights, and had to wait for hours at the airport due to the IT glitches [45678]. The delays caused frustration among passengers, with some expressing their dissatisfaction on social media platforms [45678]. Additionally, the software failure incident impacted the operational efficiency of British Airways, leading to disruptions in the check-in process and affecting the overall travel experience of customers [45678].
Domain transportation (a) The failed system was intended to support the transportation industry, specifically the airline industry. The software in question was British Airways' new check-in system 'FLY' which handles various aspects of passenger check-in, including seat allocation, baggage allowance, and document verification such as passports and visas [45678]. The incident caused chaos at airports, leading to long queues, missed flights, and frustrated passengers [45678].

Sources

Back to List