Incident: U.S. Census Bureau Switches Online Response Software Last Minute

Published Date: 2020-02-13

Postmortem Analysis
Timeline 1. The software failure incident happened just weeks before the decennial survey goes live, as mentioned in the article [95821]. 2. Published on 2020-02-13 08:00:00+00:00. 3. Estimated timeline: The incident likely occurred in late January or early February 2020.
System 1. Pegasystems Inc online response software [95821] 2. Pega platform's ability to scale to the target 600,000 users [95821]
Responsible Organization 1. The U.S. Census Bureau made the decision to shelve the online response software from Pegasystems Inc in favor of an in-house alternative due to scalability issues [95821]. 2. Pegasystems Inc provided the software that faced scalability challenges, leading to the decision to switch to an in-house alternative [95821].
Impacted Organization 1. U.S. Census Bureau [95821] 2. IT experts 3. Government Accountability Office
Software Causes 1. The software cause of the failure incident was the issue with the Pega platform's ability to scale to the target 600,000 users, leading to concerns about its capacity [95821].
Non-software Causes 1. Late change in the decision to switch software just weeks before the decennial survey goes live [95821]. 2. Concerns about the cost and security of the online census due to the switch in software [95821]. 3. Lack of extensive use of the backup system, Primus, in earlier operational testing [95821]. 4. Concerns about wasted funds and frustration expressed by IT experts [95821].
Impacts 1. The software failure incident led to the U.S. Census Bureau shelving the online response software purchased from Pegasystems Inc just weeks before the decennial survey was set to go live [95821]. 2. The switch to an in-house alternative raised concerns among IT experts regarding the cost and security of America's first online census [95821]. 3. The last-minute change to the backup system, called Primus, created new challenges as it was not extensively used in earlier operational testing [95821]. 4. The software failure incident resulted in frustration among IT experts about wasted funds, with concerns raised about the impact on the budget [95821]. 5. The potential for a hack or technology glitch due to the software failure incident could have hurt the accuracy of the census count and increased costs by expanding the door-to-door follow-up operation [95821].
Preventions 1. Thorough scalability testing of the Pega platform before implementation could have prevented the software failure incident [95821]. 2. Conducting extensive operational testing of the backup system, Primus, prior to the last-minute switch could have helped prevent challenges arising from the change [95821]. 3. Ensuring early and continuous integration testing of Primus with the bureau's IT infrastructure could have mitigated risks associated with the software switch [95821]. 4. Implementing a more gradual transition plan rather than a last-minute change could have reduced the impact of the software failure incident [95821].
Fixes 1. Conduct thorough testing of the in-house alternative software to ensure it can handle the required traffic load and functions effectively [95821]. 2. Implement risk mitigation strategies to address any potential issues with the new software and ensure a smooth transition [95821]. 3. Enhance communication and coordination between the U.S. Census Bureau and the software providers to address concerns about cost, security, and functionality [95821].
References 1. U.S. Census Bureau 2. Government Accountability Office 3. Michael Thieme, the bureau’s assistant director for decennial census programs 4. Lisa Pintchman, Pega’s vice president of corporate communications 5. Kane Baccigalupi, former member of the federal digital services agency 18F who worked on the Primus project 6. Reuters 7. Nick Brown, the reporter 8. Richard Valdmanis, the writer 9. Leslie Adler, the editor 10. Thomson Reuters Trust Principles

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident related to the U.S. Census Bureau switching from the online response software bought from Pegasystems Inc to an in-house alternative can be seen as a case of a similar incident happening again within the same organization. The article mentions that Census officials had been considering scrapping the Pega software for the in-house alternative, indicating a previous issue or dissatisfaction with the Pega software within the organization [95821]. (b) The software failure incident at the U.S. Census Bureau can also be viewed in the context of multiple organizations facing similar challenges with software implementation. The article highlights concerns raised by IT experts about the cost and security of the online response software, indicating a broader issue that organizations may face when implementing third-party software solutions [95821].
Phase (Design/Operation) design, operation (a) The software failure incident in the U.S. Census Bureau's online response software from Pegasystems Inc can be attributed to the design phase. The decision to shelve the Pega platform in favor of an in-house alternative was made after discovering an issue with the Pega platform's ability to scale to the target 600,000 users. This indicates a failure in the design phase of the software, as it was unable to meet the scalability requirements despite being intended for America's first online census [95821]. (b) Additionally, the software failure incident can also be linked to the operation phase. The last-minute change to switch to the backup system, Primus, raised concerns as it was not extensively used in earlier operational testing. This indicates that the failure was partly due to issues introduced during the operation or deployment of the system, as the backup system was not fully tested in operational scenarios before the decision was made to switch to it [95821].
Boundary (Internal/External) within_system (a) within_system: The software failure incident in the U.S. Census Bureau's online response software from Pegasystems Inc was primarily within the system. The failure was related to the Pega platform's ability to scale to the target 600,000 users, prompting the bureau to switch to an in-house alternative called Primus for the decennial survey [95821]. The decision to shelve the Pega software was made after discovering issues with its scalability, indicating an internal system limitation.
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in the U.S. Census Bureau's online response software was primarily due to non-human actions. The failure was attributed to an issue with the Pega platform's ability to scale to the target 600,000 users, leading to the decision to shelve the software in favor of an in-house alternative for risk mitigation [95821]. (b) Human actions also played a role in the software failure incident as the decision to switch from the Pega platform to an in-house alternative was made by the U.S. Census Bureau after discovering the scalability issue. Additionally, concerns were raised by IT experts about the cost and security implications of the last-minute change [95821].
Dimension (Hardware/Software) software (a) The software failure incident in the U.S. Census Bureau's online response software was not attributed to hardware issues but rather to software-related factors. The decision to shelve the software from Pegasystems Inc was made due to an issue with the platform's ability to scale to the target 600,000 users, indicating a software scalability problem [95821]. (b) The software failure incident was primarily due to contributing factors originating in the software itself. The U.S. Census Bureau decided to switch to an in-house alternative software as a risk mitigation strategy, highlighting concerns about the scalability and performance of the Pega platform for handling the expected user traffic during the census [95821].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident in this case does not appear to be malicious. The failure was related to the Pega platform's ability to scale to the target 600,000 users for the online census, leading to the decision to switch to an in-house alternative for risk mitigation [95821]. There is no indication in the article that the failure was due to any malicious intent to harm the system.
Intent (Poor/Accidental Decisions) poor_decisions The intent of the software failure incident can be attributed to poor decisions. The U.S. Census Bureau decided to shelve the online response software from Pegasystems Inc just weeks before the decennial survey goes live due to concerns about the Pega platform's ability to scale to the target 600,000 users [95821]. This last-minute change created new challenges, and the backup system, called Primus, was not extensively used in earlier operational testing. The decision to switch software providers so close to the launch date raised concerns among IT experts about the cost and security of the online census [95821].
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident in the U.S. Census Bureau's online response software from Pegasystems Inc can be attributed to development incompetence. The Government Accountability Office reported that the decision to switch software was made after discovering an issue with the Pega platform's ability to scale to the target 600,000 users. Additionally, the backup system, called Primus, was not extensively used in earlier operational testing, indicating a lack of thorough testing and preparation [95821]. (b) The software failure incident can also be seen as accidental, as the last-minute change to switch software just weeks before the decennial survey goes live raises concerns about the late change and the potential for wasted funds. The decision to shelve the Pega software in favor of an in-house alternative was made for risk mitigation purposes, indicating a reactive approach to addressing potential scalability issues [95821].
Duration temporary (a) The software failure incident in this case appears to be temporary. The U.S. Census Bureau decided to shelve the online response software from Pegasystems Inc just weeks before the decennial survey goes live due to an issue with the Pega platform's ability to scale to the target 600,000 users. The switch to an in-house alternative, Primus, was made for risk mitigation purposes, and the Pega software will still be used as a backup and for other census functions [95821]. This indicates that the failure was due to specific circumstances related to scalability and not a permanent issue with the software itself.
Behaviour other (a) crash: The software failure incident in the article is not described as a crash where the system loses state and does not perform any of its intended functions [95821]. (b) omission: The software failure incident does not involve the system omitting to perform its intended functions at an instance(s) [95821]. (c) timing: The software failure incident is not related to the system performing its intended functions correctly but too late or too early [95821]. (d) value: The software failure incident is not attributed to the system performing its intended functions incorrectly [95821]. (e) byzantine: The software failure incident is not characterized by the system behaving erroneously with inconsistent responses and interactions [95821]. (f) other: The behavior of the software failure incident in the article is related to the decision to shelve the online response software bought from Pegasystems Inc just weeks before the decennial survey goes live due to concerns about the platform's ability to scale to the target number of users. This last-minute change has raised new challenges and concerns about cost and security, with the backup system not extensively tested in earlier operational testing [95821].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property, theoretical_consequence (d) property: People's material goods, money, or data was impacted due to the software failure The software failure incident related to the U.S. Census Bureau's decision to shelve the online response software from Pegasystems Inc in favor of an in-house alternative had consequences related to property. The switch to the in-house alternative raised concerns about the cost and security of the online census. The last-minute change to the Primus system, as a backup, was noted to create new challenges, and IT experts expressed frustration about wasted funds due to the late change [95821].
Domain government (a) The failed system was intended to support the government industry, specifically the U.S. Census Bureau's population count [95821]. The software failure incident was related to the online response software purchased from Pegasystems Inc for the decennial census, which is a critical government operation determining congressional seats, federal spending allocation, and overall accuracy of demographic data.

Sources

Back to List