Incident: Failure of Federal Protective Service Inspection Software System.

Published Date: 2012-07-25

Postmortem Analysis
Timeline 1. The software failure incident happened last month before the article was published on July 25, 2012 [13058]. Therefore, the software failure incident likely occurred in June 2012.
System 1. Risk Assessment and Management Program (Ramp) [13058] 2. Modified Infrastructure Survey Tool (Mist) [13058]
Responsible Organization 1. The Department of Homeland Security's police and security agency was responsible for causing the software failure incident [13058].
Impacted Organization 1. The Department of Homeland Security's police and security agency (FPS) [13058] 2. Federal facilities across the country [13058]
Software Causes 1. The software program called the Risk Assessment and Management Program (Ramp) failed due to unreliability, including data disappearing from the database, connection issues in remote areas, and inability to verify training and certification information [13058]. 2. The replacement software tool, Modified Infrastructure Survey Tool (Mist), was found to have a major vulnerability in that it couldn't compare security risks between federal facilities, assuming all facilities within the same security level had the same risk regardless of location. This limitation hindered the prioritization and mitigation of critical risks across different facilities [13058].
Non-software Causes 1. Lack of minimum security standards and inspection regime for federal facilities [13058] 2. Unreliable data on inspection backlog and extent of inspections [13058] 3. Challenges in overseeing contract guards [13058] 4. Inability to compare security risks between federal facilities [13058] 5. Lack of prioritization and mitigation of critical risks at federal facilities [13058] 6. Failure to factor potential consequences of adverse events like terrorist attacks [13058] 7. Security lapses including explosives in lost and found, stolen guns, and delayed discovery of a dead body [13058]
Impacts 1. The software failure incident led to the inability of the Federal Protective Service (FPS) to accurately measure security risks in federal facilities, as the new software tool, Mist, was unable to compare security risks between different federal facilities [13058]. 2. The failure of the software tools, including the previous system Ramp and the interim system used before Mist, resulted in security lapses such as explosives being mistakenly placed in the lost and found, guns being stolen from a federal building, and a dead body being discovered months after the person died at a facility [13058]. 3. The software failure incident caused challenges in overseeing approximately 12,500 contract guards, as the data was unreliable and inspections were not being conducted effectively [13058].
Preventions 1. Implementing thorough testing procedures before deploying the software program could have potentially prevented the software failure incident [13058]. 2. Conducting a comprehensive evaluation of the software's capabilities and limitations, including its ability to compare security risks between federal facilities, could have helped identify potential issues before deployment [13058]. 3. Ensuring proper data validation and verification mechanisms within the software to prevent data discrepancies and errors, as seen with the previous system, Ramp, could have mitigated the risk of failure [13058]. 4. Investing in ongoing maintenance and updates for the software to address any identified vulnerabilities or shortcomings could have improved its effectiveness and reliability [13058].
Fixes 1. Implement a software tool that accurately measures security risks and can prioritize vulnerabilities across federal facilities [13058]. 2. Ensure the new software tool factors in the potential consequences of security vulnerabilities being exploited in an attack to enable informed decision-making on resource allocation [13058]. 3. Develop a permanent software solution that addresses the shortcomings of the interim tools like Mist and Ramp, providing reliable data storage, connectivity, and verification mechanisms for inspections [13058].
References 1. Government Accountability Office [13058]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident has happened again at the same organization. The Federal Protective Service (FPS) had previously experienced a software failure with the Risk Assessment and Management Program (Ramp), which was unreliable and had various issues such as data disappearing from the database, connectivity problems in remote areas, and difficulties in verifying training and certification information from contractors [13058]. (b) The software failure incident has happened again at multiple organizations. The article mentions that the Department of Homeland Security's police and security agency is preparing to adopt a new software tool for inspections, but this tool also has issues and may not accurately measure security risks. Additionally, the article highlights security lapses at federal buildings in different locations, such as explosives mistakenly placed in the lost and found in Detroit, guns stolen in Atlanta, and a dead body discovered in Kansas City months after the person died [13058].
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase is evident in the case of the software tools used for inspecting federal buildings. The article mentions the failure of the previous system called the Risk Assessment and Management Program (Ramp) due to design flaws. Ramp, which was supposed to be simple, had issues such as recorded inspections disappearing from the database, connectivity problems in remote areas, and inability to verify training and certification information from contractors [13058]. (b) The software failure incident related to the operation phase is highlighted by the challenges faced during the operation of the interim systems before the adoption of a new software tool. The article mentions that the current interim system does not allow for the generation of post-inspection reports, lacks a way for inspectors to check guard training and certification data, making it difficult to verify compliance with inspection requirements [13058].
Boundary (Internal/External) within_system, outside_system (a) within_system: The software failure incident described in the articles is primarily within the system. The failure of the software tools like the Risk Assessment and Management Program (Ramp) and its replacement, the Modified Infrastructure Survey Tool (Mist), were due to internal factors such as design limitations, unreliability, and inability to compare security risks between federal facilities [13058]. These issues stemmed from within the system itself, indicating a failure in the development and implementation of the software tools. (b) outside_system: While the software failure incident is mainly attributed to within-system factors, there are also external factors mentioned in the articles. For example, the lack of minimum security standards and inspection regime for federal facilities before the development of the software tools highlights external factors contributing to the failure incident [13058]. Additionally, the challenges faced by the Federal Protective Service in overseeing contract guards and the security lapses in various federal buildings point to external factors impacting the overall security situation, which in turn affects the software tools being used [13058].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in the article was primarily due to non-human actions, such as the software programs like Mist and Ramp failing to accurately measure security risks and prioritize vulnerabilities across federal facilities [13058]. These failures were attributed to design flaws and limitations in the software tools themselves, rather than human actions. (b) However, human actions also played a role in the software failure incident. For example, there were issues with inspectors duplicating paperwork to spoof inspections, guards mishandling security procedures leading to incidents like a bag of explosives being misplaced, and guns being stolen from a federal building by a contract guard [13058]. These human actions contributed to the overall failure of the security inspection systems.
Dimension (Hardware/Software) hardware, software (a) The software failure incident related to hardware: The article mentions that the software program used for testing federal buildings and sites, including the failed system called the Risk Assessment and Management Program (Ramp), faced issues such as not being able to connect to servers in remote areas, data disappearing from the database, and difficulties in verifying training and certification information from contractors [13058]. (b) The software failure incident related to software: The article highlights that the software tools used for inspections, such as the Risk Assessment and Management Program (Ramp) and the Modified Infrastructure Survey Tool (Mist), had inherent software-related issues. Ramp was described as unreliable, with data disappearing from the database and connectivity problems, while Mist was criticized for not being able to accurately measure security risks between federal facilities and lacking the ability to prioritize risks effectively [13058].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident described in the articles is non-malicious. The failure was not due to contributing factors introduced by humans with the intent to harm the system. Instead, the failures were attributed to the software programs, such as the Risk Assessment and Management Program (Ramp) and the Modified Infrastructure Survey Tool (Mist), being unreliable, ineffective, and unable to fulfill their intended purposes [13058]. The issues stemmed from design flaws, lack of functionality, and inability to accurately measure security risks, rather than any malicious intent.
Intent (Poor/Accidental Decisions) poor_decisions (a) The intent of the software failure incident was poor_decisions. The failure was due to poor decisions made in the development and implementation of software tools for inspecting federal buildings and sites. The government spent millions on software programs like the Risk Assessment and Management Program (Ramp) and the Modified Infrastructure Survey Tool (Mist), which ultimately failed to effectively measure security risks and prioritize vulnerabilities [13058]. The decision to adopt these tools without proper testing and consideration of their limitations led to a situation where critical security risks were not being prioritized and mitigated, putting federal facilities at risk.
Capability (Incompetence/Accidental) development_incompetence (a) The software failure incident related to development incompetence is evident in the case of the software tools used for inspecting federal buildings. The article mentions that the previous tool, the Risk Assessment and Management Program (Ramp), failed to work as intended despite its high cost and was deemed unreliable. Inspectors faced issues such as data disappearing from the database, connectivity problems in remote areas, and challenges in verifying contractor information. Additionally, the new software tool, the Modified Infrastructure Survey Tool (Mist), was developed at a cost of $5 million but has significant vulnerabilities, such as not being able to compare security risks between federal facilities effectively. These instances point to failures in software development and implementation due to incompetence [13058]. (b) The software failure incident related to accidental factors is highlighted in the challenges faced by the Federal Protective Service (FPS) in overseeing its contract guards and conducting inspections. The article mentions security lapses, including incidents like explosives being mistakenly placed in the lost and found, guns being stolen from a federal building, and a dead body being discovered months after the person died at a facility. These incidents indicate failures in security protocols and procedures that may have occurred accidentally, leading to vulnerabilities and risks within federal facilities [13058].
Duration temporary (a) The software failure incident described in the articles is more of a temporary nature rather than permanent. The articles discuss how the government's software tools, such as the Risk Assessment and Management Program (Ramp) and the Modified Infrastructure Survey Tool (Mist), were introduced as interim solutions after previous failures. Ramp was described as unreliable and ultimately dumped, leading to the adoption of Mist, which is also highlighted as having significant vulnerabilities and limitations. The fact that these tools were intended as temporary solutions until a better system could be developed indicates that the software failure incidents were not considered permanent but rather a result of specific circumstances and shortcomings in the tools themselves [13058].
Behaviour omission, value, other (a) crash: The software failure incident described in the articles does not specifically mention a crash where the system loses state and does not perform any of its intended functions [13058]. (b) omission: The software failure incident involves failures related to omission, where the system omits to perform its intended functions at instances. For example, the previous system called Ramp had issues where recorded inspections of guard posts disappeared from the database without explanation, and inspectors had no way of verifying training and certification information from contractors [13058]. (c) timing: The articles do not mention a specific failure related to timing, where the system performs its intended functions correctly but too late or too early [13058]. (d) value: The software failure incident includes failures related to the system performing its intended functions incorrectly. For instance, the new software tool called Mist is unable to accurately measure security risks between federal facilities, assuming all facilities within the same security level have the same security risk regardless of their location. This incorrect assessment of security risks leads to a failure in providing accurate prioritization and mitigation of risks [13058]. (e) byzantine: The software failure incident does not exhibit behaviors of a byzantine failure, where the system behaves erroneously with inconsistent responses and interactions [13058]. (f) other: The software failure incident involves other behaviors such as the system being unreliable, not connecting to servers in remote areas, and having issues with verifying training and certification information from contractors. Additionally, the system had vulnerabilities in terms of not factoring potential consequences of adverse events like a terrorist attack, leading to challenges in effectively determining security risks and appropriate actions [13058].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property, theoretical_consequence (a) death: The articles do not mention any direct consequences of people losing their lives due to the software failure incident. [13058] (b) harm: The articles do not mention any direct physical harm to individuals due to the software failure incident. [13058] (c) basic: The articles do not mention any impact on people's access to food or shelter due to the software failure incident. [13058] (d) property: The articles mention incidents such as a bag of explosives mistakenly placed in the lost and found at a federal building in Detroit, 22 guns stolen from a federal building in Atlanta by a contract guard, and a dead body discovered at a facility in Kansas City months after the person died. These incidents could be considered as impacts on property or security due to the software failure incident. [13058] (e) delay: The articles do not mention any direct delays caused by the software failure incident. [13058] (f) non-human: The articles do not mention any direct impacts on non-human entities due to the software failure incident. [13058] (g) no_consequence: The articles do not mention that there were no real observed consequences of the software failure incident. [13058] (h) theoretical_consequence: The articles discuss potential consequences of the software failure incident, such as the inability to effectively prioritize security risks and mitigate vulnerabilities in federal facilities, which could lead to theoretical consequences like inadequate security measures. [13058] (i) other: The articles do not mention any other specific consequences of the software failure incident beyond those discussed in the options above. [13058]
Domain government The failed system mentioned in the article is related to the government industry [13058]. The software program, initially the Risk Assessment and Management Program (Ramp) and later the Modified Infrastructure Survey Tool (Mist), was developed and deployed by the Department of Homeland Security's police and security agency to test federal buildings and sites for potential vulnerabilities. The Federal Protective Service (FPS) within the government was facing challenges in overseeing its contract guards and inspecting federal facilities, leading to the implementation of these software tools to address security risks and vulnerabilities.

Sources

Back to List