Incident: Social Security Administration's Disability Claims System Failure: Costly and Incomplete

Published Date: 2014-07-24

Postmortem Analysis
Timeline 1. The software failure incident happened approximately six years before the article was published in July 2014. 2. Therefore, the software failure incident likely occurred around July 2008.
System The software failure incident reported in Article 28168 involved the failure of the Disability Case Processing System (DCPS) implemented by the Social Security Administration.
Responsible Organization 1. Social Security Administration 2. Lockheed Martin 3. McKinsey and Co. [28168]
Impacted Organization 1. Social Security Administration [28168]
Software Causes 1. Lack of clear leadership and accountability in the project, with no single person responsible for completing the software system [28168] 2. Poor execution and ambiguous scope of the project leading to delays and mismanagement [28168] 3. Inability of the system to process all new claims and accurately track them, with over 380 outstanding problems reported [28168]
Non-software Causes 1. Lack of clear leadership and accountability in the project [28168] 2. Mismanagement and delays in the project [28168] 3. Ambiguity in the scope of the project [28168] 4. Poor execution of the project [28168]
Impacts 1. Delays in disability claims processing: The software failure incident led to long delays at nearly every step of the disability claims process, which were supposed to be reduced by the new processing system [28168]. 2. Financial losses: The Social Security Administration spent nearly $300 million on the failed IT project, resulting in a significant financial loss [28168]. 3. Stakeholder concerns: The project faced increasing stakeholder concerns due to limited functionality, schedule delays, and overall mismanagement [28168]. 4. Risk of insolvency: The failure of the software project added pressure to the Social Security disability program, which was already edging towards the brink of insolvency [28168].
Preventions 1. Proper project management with clear accountability and oversight could have prevented the software failure incident. Having a single person responsible for completing the project and ensuring effective leadership could have helped avoid delays and mismanagement [28168]. 2. Thorough testing and validation of the new software system before full implementation could have prevented the failure incident. Ensuring that the system can process all necessary functions and handle applications accurately would have identified issues earlier on [28168]. 3. Regular communication and transparency regarding the progress and challenges of the project could have prevented the failure incident. Keeping stakeholders informed and addressing concerns promptly could have helped in course correction and avoiding a situation where the project is adrift [28168].
Fixes 1. Assign a single person as the project lead responsible for completing the software project [28168]. 2. Implement the recommendations provided by the independent consultants from McKinsey to improve project management and oversight [28168]. 3. Consider the possibility of changing vendors if necessary to better manage the project [28168].
References 1. Internal report commissioned by the Social Security Administration 2. McKinsey and Co., a management consulting firm 3. House Oversight Committee 4. Rep. Darrell Issa, R-Calif. 5. Reps. Jim Jordan, R-Ohio, and James Lankford, R-Okla.

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident having happened again at one_organization: The Social Security Administration faced a software failure incident with the Disability Case Processing System (DCPS) project, which was aimed at replacing outdated computer systems for handling disability claims. The project, led by Lockheed Martin, experienced significant delays, mismanagement, and cost overruns, with the new system still not ready after six years of development [28168]. (b) The software failure incident having happened again at multiple_organization: There is no specific mention in the provided article about the same software failure incident happening at other organizations or with their products and services.
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase is evident in the article. The Social Security Administration's new computer system for handling disability claims faced delays and mismanagement during the development phase. The project, led by Lockheed Martin, was plagued by schedule delays, increasing stakeholder concerns, and poor execution. The project lacked leadership, and the scope was ambiguous, leading to a situation where the system couldn't even process all new claims accurately [28168]. (b) The software failure incident related to the operation phase is also highlighted in the article. Users faced long delays at nearly every step of the disability claims process due to the new processing system's inefficiencies. The system was unable to accurately track claims as they progressed through the system, with over 380 problems still outstanding. This operational failure impacted the ability of workers across the country to process claims efficiently and track their progress [28168].
Boundary (Internal/External) within_system (a) The software failure incident related to the Social Security Administration's disability claims system can be categorized as within_system. The failure was primarily due to contributing factors that originated from within the system itself, such as mismanagement, delays, lack of leadership, poor execution, and ambiguous project scope [28168]. The project faced issues like being two to three years behind schedule, having no one in charge of completing the project, facing schedule delays, and increasing stakeholder concerns. The audit report highlighted problems with the project's development, lack of leadership, and poor execution, indicating internal factors leading to the failure. Additionally, the project, known as the Disability Case Processing System (DCPS), was supposed to replace outdated computer systems used by state Social Security offices but faced challenges in processing claims and tracking them accurately within the system [28168].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in the Social Security Administration's disability claims system seems to be primarily due to non-human actions, such as technical issues, mismanagement, and lack of clear leadership. The project faced delays, mismanagement, and technical challenges, leading to a situation where the system was not ready even after six years of development [28168]. (b) On the other hand, human actions also played a significant role in the software failure incident. The project was described as having no one in charge, indicating a lack of clear responsibility and oversight. The appointment of Terrie Gruber to oversee the project and the involvement of outside consultants from McKinsey suggest that human actions, such as decision-making and project management, were contributing factors to the failure [28168].
Dimension (Hardware/Software) software (a) The software failure incident in the Social Security Administration's disability claims system does not seem to be directly attributed to hardware issues. The articles primarily highlight mismanagement, delays, lack of leadership, and poor execution as key contributing factors to the failure of the new computer system [28168]. (b) The software failure incident in the Social Security Administration's disability claims system is primarily attributed to contributing factors that originate in software. The project faced delays, mismanagement, and a lack of clear leadership, leading to the new system not being ready even after six years of development. The system was unable to process all new claims, had numerous outstanding problems, and lacked the ability to accurately track claims as intended. The software project, known as the Disability Case Processing System (DCPS), was described as adrift, poorly executed, and lacking leadership in its development [28168].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident related to the Social Security Administration's disability claims system does not seem to be malicious. The failure was primarily attributed to delays, mismanagement, lack of clear leadership, poor execution, and ambiguous project scope [28168]. There is no indication in the articles that the failure was due to intentional harm caused by individuals. (b) The software failure incident can be categorized as non-malicious as it was mainly caused by factors such as delays, mismanagement, lack of clear leadership, poor execution, and ambiguous project scope [28168]. The failure was not attributed to any intentional malicious actions by individuals to harm the system.
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident related to the Social Security Administration's new computer system for handling disability claims can be attributed to poor decisions. The project faced delays, mismanagement, and a lack of clear leadership, with no single person responsible for completing the project [28168]. The audit report highlighted that the project was adrift, poorly executed, and lacked leadership, indicating poor decision-making processes [28168]. Additionally, the project was described as an "IT boondoggle" by committee leaders, suggesting that it was a result of poor decisions rather than accidental mistakes [28168].
Capability (Incompetence/Accidental) development_incompetence, unknown (a) The software failure incident related to development incompetence is evident in the Social Security Administration's failed project to create a new computer system to handle disability claims. The project, led by Lockheed Martin, faced delays, mismanagement, and a lack of clear leadership, with no single person responsible for completing the project [28168]. (b) The software failure incident related to accidental factors is not explicitly mentioned in the articles provided.
Duration permanent, temporary (a) The software failure incident in the Social Security Administration's disability claims processing system can be considered as a permanent failure. The project to replace the outdated computer systems overwhelmed by disability claims has been ongoing for six years and is still not completed [28168]. The project faced delays, mismanagement, and a lack of clear leadership, leading to significant issues in functionality and schedule [28168]. The system was unable to process all new claims and had over 380 outstanding problems as of April, indicating a long-standing and persistent failure [28168]. (b) The software failure incident can also be seen as a temporary failure in the sense that efforts are being made to salvage the project and reset it in an attempt to save it [28168]. The agency brought in outside consultants to analyze the situation and appointed a new overseer for the project to implement recommendations and improve the prospects of completing the initiative successfully [28168]. Despite the long duration of the failure, there are ongoing efforts to address the issues and move towards completion.
Behaviour crash, omission, value, other (a) crash: The software failure incident in the Social Security Administration's new computer system for handling disability claims can be categorized as a crash. The system was described as not being ready even after six years of development, with the project still in the testing phase and no clear timeline for completion [28168]. (b) omission: The software failure incident can also be categorized as an omission. The new system was supposed to reduce delays in processing disability claims, but instead, people filing for claims faced long delays at nearly every step of the process, indicating a failure of the system to perform its intended functions [28168]. (c) timing: The software failure incident can be related to timing as well. The project was significantly behind schedule, with initial estimates suggesting it was two to three years from completion in 2008, but even five years later, it was still two to three years away from being done. This delay in delivering the system on time can be considered a timing-related failure [28168]. (d) value: The software failure incident can also be attributed to a value-related failure. Despite investing $288 million over six years, the system only delivered limited functionality, faced schedule delays, and increased stakeholder concerns. This indicates that the system was not performing its intended functions correctly despite the significant investment [28168]. (e) byzantine: The software failure incident does not directly align with a byzantine failure, which involves inconsistent responses and interactions. The primary issues in this case were related to delays, mismanagement, lack of clear leadership, and poor execution rather than erratic or inconsistent behavior [28168]. (f) other: The software failure incident can be categorized under the "other" behavior as well. The project was described as adrift, with an ambiguous scope, poor execution, and a lack of leadership. These factors contribute to a broader failure scenario that goes beyond a single type of behavior, encompassing various aspects of mismanagement and inefficiency in the project [28168].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property, theoretical_consequence (d) property: People's material goods, money, or data was impacted due to the software failure The software failure incident related to the Social Security Administration's new computer system for handling disability claims resulted in significant financial implications. The agency had spent nearly $300 million on the project, which still did not work after six years [28168]. The project faced schedule delays, cost overruns, and had delivered limited functionality despite the substantial investment. Additionally, the project was described as a "money-wasting project" that was rubbing salt in the wound as the disability program was edging towards insolvency [28168]. The failure of the software system impacted the agency's ability to efficiently process disability claims, leading to financial losses and inefficiencies.
Domain government (a) The failed system was intended to support the government industry. The Social Security Administration spent nearly $300 million on a new computer system to handle disability claims, which was part of the Disability Case Processing System (DCPS) project [28168]. The project aimed to replace outdated computer systems used by state Social Security offices to process disability claims [28168]. (b) Not mentioned in the articles. (c) Not mentioned in the articles. (d) Not mentioned in the articles. (e) Not mentioned in the articles. (f) Not mentioned in the articles. (g) Not mentioned in the articles. (h) Not mentioned in the articles. (i) Not mentioned in the articles. (j) Not mentioned in the articles. (k) Not mentioned in the articles. (l) Not mentioned in the articles. (m) Not mentioned in the articles.

Sources

Back to List