Incident: Watchkeeper Surveillance Drone Project: Costly Software Glitches and Delays

Published Date: 2015-10-02

Postmortem Analysis
Timeline 1. The software failure incident related to the delays and glitches in the Watchkeeper surveillance drones happened over a period of time starting from the initial announcement in 2005 [52353]. 2. The incident continued to cause delays, with the drones expected to be fully operational by 2013 but facing software glitches and staff shortages, pushing the date to at least 2017 [52353]. 3. The software failure incident can be estimated to have occurred between 2005 (initial announcement) and 2017 (expected full operational capability).
System 1. Watchkeeper surveillance drones [52353]
Responsible Organization 1. The software failure incident with the Watchkeeper surveillance drones was primarily caused by software glitches and army staff shortages, leading to significant delays and cost overruns [52353].
Impacted Organization 1. British army 2. Ministry of Defence (MoD) [52353]
Software Causes 1. Software glitches [52353]
Non-software Causes 1. Delays in development and production, leading to the project being four years late and costing significantly more than originally planned [52353]. 2. Army staff shortages impacting the project timeline [52353]. 3. Technical equipment issues with the previous unmanned aircraft, the Phoenix, which Watchkeeper was commissioned to replace [52353]. 4. Certification and regulatory challenges adding to the delays in the project [52353]. 5. Reliance on radio signals for piloting, which differs from other drone models operating by satellite [52353].
Impacts 1. Delays in the full operational capability of the Watchkeeper surveillance drones, originally expected to be fully operational by 2013 but now delayed to 2017 at the earliest due to software glitches and army staff shortages [52353]. 2. Increased costs of the project, with the estimated cost of achieving full operation now standing at £1.2bn, significantly more than the original planned cost of £800m [52353]. 3. Limited operational use of the drones, with only three out of 54 Watchkeeper drones seeing active duty in Afghanistan before troop withdrawal, flying for a total of 146 hours before British forces left the country [52353]. 4. The need for considerable further development to meet the 2017 target for full operational capability, including the lack of trained pilots, with only six currently available but expected to rise to 100 when the drone is fully operational [52353].
Preventions 1. Conducting thorough software testing and quality assurance throughout the development process to identify and address software glitches early on [52353]. 2. Ensuring an adequate number of trained pilots for the drone to operate effectively, as the lack of trained pilots was identified as a problem contributing to the delays [52353]. 3. Considering off-the-shelf solutions for drone technology, such as the US army's acquisition of the Gray Eagle drone, to potentially avoid the delays and issues faced with developing bespoke technology [52353].
Fixes 1. Implement rigorous testing and certification processes to ensure the software meets regulatory standards and is airworthy [52353]. 2. Increase the number of trained pilots to operate the drone effectively when it becomes fully operational [52353]. 3. Consider purchasing proven off-the-shelf solutions, like the US army did with the Gray Eagle drone, to avoid costly delays and issues with bespoke technology [52353].
References 1. The non-profit Bureau of Investigative Journalism 2. The Guardian [52353]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident having happened again at one_organization: The article mentions a previous British-designed unmanned aircraft, the Phoenix, which had its own set of issues, including being forced to land on its back due to technical equipment problems. The Phoenix was known among troops as the "Bugger Off" owing to its frequent failure to return from missions. This incident with the Phoenix could be considered a software failure incident within the same organization [52353]. (b) The software failure incident having happened again at multiple_organization: The article compares the delays and issues faced by the Watchkeeper project with the US army project to acquire the Gray Eagle drone from General Atomics. The article highlights that the US army project opted for a proven off-the-shelf solution from the United States, while the UK project faced significant delays and cost overruns with the Watchkeeper project. This comparison suggests that similar incidents of software failure or project delays have occurred at different organizations [52353].
Phase (Design/Operation) design (a) The software failure incident related to the development phase: The delay in the development of the Watchkeeper surveillance drones was primarily attributed to software glitches and army staff shortages. The project, which was expected to be fully operational by 2013, faced significant delays, pushing the operational date to 2017 at the earliest [52353]. The Ministry of Defence (MoD) acknowledged that achieving full operational capability by 2017 would require considerable further development, indicating challenges in the software development phase [52353]. (b) The software failure incident related to the operational phase: The article does not provide specific information about software failure incidents related to the operational phase.
Boundary (Internal/External) within_system, outside_system (a) within_system: The software failure incident related to the Watchkeeper surveillance drones was primarily due to within-system factors such as software glitches and army staff shortages. The delays in the project, cost overruns, and the need for further development were all internal issues within the system [52353]. (b) outside_system: On the other hand, external factors such as changes in aircraft safety regulations and certification in the UK also contributed to the delays in the project. These external factors added to the complexities and challenges faced by the development of the Watchkeeper drones [52353].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in the case of the Watchkeeper surveillance drones was primarily due to non-human actions such as software glitches and delays in development. The delays were attributed to factors like software glitches and army staff shortages, which led to the project being significantly behind schedule and over budget [52353]. (b) On the other hand, human actions also played a role in the failure of the Watchkeeper project. The decision-making process, procurement choices, and management of the project by individuals within the Ministry of Defence and the consortium led by Thales contributed to the delays and cost overruns. For example, the decision to commission a bespoke technology solution instead of purchasing proven off-the-shelf solutions like the US army did with the Gray Eagle drone was highlighted as a lesson learned from the Watchkeeper project [52353].
Dimension (Hardware/Software) hardware, software (a) The software failure incident related to hardware can be seen in the delays and issues faced by the Watchkeeper surveillance drones. The article mentions that the project faced delays due to software glitches and army staff shortages, which are contributing factors originating in hardware [52353]. (b) The software failure incident related to software can be observed in the challenges faced by the Watchkeeper drones. The delays in becoming fully operational, software glitches, and the need for further development all point to issues originating in the software aspect of the project [52353].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident related to the Watchkeeper surveillance drones does not appear to be malicious. The delays and issues with the software were primarily due to software glitches, army staff shortages, lack of trained pilots, and regulatory challenges, rather than any intentional harm to the system [52353]. (b) The software failure incident can be categorized as non-malicious, as the delays and problems were a result of various contributing factors such as technical challenges, certification requirements, and project management issues, rather than any deliberate attempt to cause harm to the system [52353].
Intent (Poor/Accidental Decisions) poor_decisions, accidental_decisions (a) The intent of the software failure incident related to poor decisions can be inferred from the article. The project to develop the Watchkeeper surveillance drones faced significant delays and cost overruns due to poor decisions made during the procurement process. The initial decision to award the contract to a consortium led by the French defense firm Thales, the delays in development, and the failure to meet operational deadlines all point to poor decisions contributing to the software failure incident [52353]. (b) The software failure incident can also be attributed to accidental decisions or unintended consequences. For example, the delays in the project were partly due to software glitches and army staff shortages, which could be considered accidental decisions or unintended consequences that contributed to the failure of the software development process [52353].
Capability (Incompetence/Accidental) development_incompetence, unknown (a) The software failure incident related to development incompetence is evident in the case of the British army's Watchkeeper surveillance drones. The project faced significant delays and cost overruns due to software glitches and army staff shortages. The initial plan to have the drones fully operational by 2013 was pushed back to at least 2017, with only three out of 54 drones seeing active duty after 10 years of development [52353]. (b) The software failure incident related to accidental factors is not explicitly mentioned in the provided article.
Duration temporary The software failure incident related to the Watchkeeper surveillance drones can be categorized as a temporary failure. The delays in the software development and operational readiness of the drones were primarily attributed to software glitches and army staff shortages [52353]. These contributing factors introduced certain circumstances that led to the temporary failure of the software to meet the expected timelines and operational capabilities.
Behaviour crash, omission, timing, other (a) crash: The software failure incident related to the Watchkeeper surveillance drones can be attributed to a crash. The drones experienced software glitches, which contributed to the delays in their deployment and operational readiness. The article mentions that the first Watchkeeper drones were expected to be fully operational by 2013, but due to software glitches and other issues, the date was pushed back to 2017 at the earliest [52353]. (b) omission: The software failure incident can also be linked to omission. Despite the initial plans and expectations for the Watchkeeper drones to be operational by a certain timeline, the software glitches and other challenges led to the omission of the system to perform its intended functions as scheduled. This omission resulted in delays and operational setbacks for the project [52353]. (c) timing: The timing of the software failure incident is another aspect to consider. The delays in the development and deployment of the Watchkeeper drones can be attributed to timing issues. The software glitches and other challenges caused the system to perform its intended functions too late, missing the original deadlines set for its operational readiness [52353]. (d) value: The software failure incident also involves a failure in value. Despite the significant investment and resources allocated to the Watchkeeper project, the delays, software glitches, and operational challenges have led to the system not delivering the expected value. The project's cost has escalated beyond the initial estimates, and the system has not been utilized as originally intended, impacting its overall value proposition [52353]. (e) byzantine: The software failure incident does not exhibit characteristics of a byzantine failure. The issues primarily revolve around software glitches, delays, and operational challenges rather than inconsistent responses or interactions within the system [52353]. (f) other: The software failure incident can be categorized under the "other" behavior as well. The challenges faced by the Watchkeeper project, including software glitches, delays, and operational setbacks, represent a combination of various failure modes and issues that do not fit neatly into the predefined categories of crash, omission, timing, or value. The complexity and interplay of these factors contribute to a unique set of challenges for the project [52353].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property, delay, non-human, theoretical_consequence, other (a) death: There is no mention of people losing their lives due to the software failure incident in the provided article [52353]. (b) harm: There is no mention of people being physically harmed due to the software failure incident in the provided article [52353]. (c) basic: There is no mention of people's access to food or shelter being impacted because of the software failure incident in the provided article [52353]. (d) property: The software failure incident related to the Watchkeeper surveillance drones resulted in significant delays and increased costs. The project was expected to cost £800m initially, but the estimated cost of achieving full operation is now £1.2bn due to software glitches and other issues [52353]. (e) delay: The software glitches and army staff shortages led to significant delays in the Watchkeeper drone project. The drones were expected to be fully operational by 2013, but the date was pushed back to 2017 at the earliest due to these issues [52353]. (f) non-human: The software failure incident impacted the deployment and operational capabilities of the Watchkeeper surveillance drones, which are unmanned aerial systems [52353]. (g) no_consequence: There were observed consequences of the software failure incident related to the Watchkeeper drones, including delays and increased costs [52353]. (h) theoretical_consequence: The article discusses potential consequences of the software failure incident, such as the impact on operations in Afghanistan and the need for trained pilots when the drone is fully operational [52353]. (i) other: The software failure incident led to the Ministry of Defence having to lease nine Hermes 450 drones from the Israeli aerospace firm Elbit to fill the gap caused by the delays in the Watchkeeper project [52353].
Domain government The failed system in the article is related to the defense industry, specifically the military surveillance sector. The software failure incident pertains to the delays and glitches in the development and deployment of the Watchkeeper surveillance drones for the British army [52353]. The project faced software glitches, staff shortages, and delays, leading to significant cost overruns and operational setbacks.

Sources

Back to List