Recurring |
one_organization |
(a) The software failure incident happened again at one_organization:
The incident involving the eCensus website shutdown in Australia in 2016 was attributed to a distributed denial-of-service (DDoS) attack and issues with geo-blocking protocols. IBM, the company responsible for developing and running the eCensus website, faced a similar incident again when the website went offline for over 40 hours due to these issues. Despite IBM's claims of anticipating and planning for DDoS attacks using geo-blocking, the incident still occurred, leading to significant inconvenience and financial losses for the Australian government and taxpayers [48494, 49158].
(b) The software failure incident happened again at multiple_organization:
There is no specific mention in the provided articles about the software failure incident happening again at other organizations or with their products and services. |
Phase (Design/Operation) |
design, operation |
(a) The software failure incident related to the design phase was primarily due to the failure of geo-blocking protocols not being properly applied by the internet service provider (ISP) as part of the system development. IBM, the contractor responsible for the eCensus website, mentioned that the incident was caused by a geo-blocking protocol not being applied by the ISP, leading to distributed denial-of-service (DDoS) attacks [48494].
(b) The software failure incident related to the operation phase was highlighted by IBM's managing senior engineer Michael Shallcross, who suggested that a simple solution of turning the router's power 'off and on again' could have solved the problem earlier. This indicates that operational issues, such as router configuration problems, contributed to the system failure during operation [49158]. |
Boundary (Internal/External) |
within_system, outside_system |
(a) within_system: The software failure incident related to the eCensus website shutdown in Australia was primarily due to factors originating from within the system. IBM, the company responsible for developing and running the eCensus website, faced issues such as distributed denial-of-service (DDoS) attacks, misconfiguration of routers, failure in geo-blocking protocols, and inadequate testing of the system's resilience to attacks [48494, 49158]. These internal factors contributed to the website going offline for over 40 hours, causing inconvenience to the Australian public and the government.
(b) outside_system: On the other hand, external factors also played a role in the software failure incident. The incident involved DDoS attacks that originated externally, causing disruptions to the eCensus website [48494, 49158]. Additionally, there were issues with the implementation of geo-blocking by internet sub-contractors, which led to foreign traffic bypassing the intended restrictions and affecting the website's performance [48494]. |
Nature (Human/Non-human) |
non-human_actions, human_actions |
(a) The software failure incident occurring due to non-human actions:
- The software failure incident was attributed to a distributed denial-of-service (DDoS) attack on the eCensus website, causing it to go offline for over 40 hours [48494].
- IBM had implemented protection measures such as geo-blocking, known as 'Island Australia,' to defend against DDoS attacks [48494].
- The incident involved a failure in the geo-blocking service during the fourth DDoS attack, leading to the website becoming unresponsive [49158].
- The DDoS attack traffic peaked at 563Mbps and lasted 14 minutes, which was considered significant in the industry [49158].
(b) The software failure incident occurring due to human actions:
- IBM managing director Kerry Purcell took full responsibility for the Census website meltdown but insisted the website was not hacked [48494].
- There was a blame-game between the contractor and its sub-contractors over the DDoS attacks, with issues related to the geo-blocking protocol not being applied by an internet service provider [48494].
- IBM engineer Michael Shallcross mentioned that greater certainty from sub-contractors regarding the implementation of geo-blocking directions and more testing on the routers could have been beneficial [48494].
- The ABS stated that the risk of DDoS attacks was not adequately addressed by IBM, indicating a failure in managing risk related to the incident [49158]. |
Dimension (Hardware/Software) |
hardware, software |
(a) The software failure incident occurring due to hardware:
- In Article 49158, IBM Australia's managing senior engineer Michael Shallcross mentioned that a simple solution to the problem could have been turning the router's power 'off and on again', indicating a hardware-related issue [49158].
(b) The software failure incident occurring due to software:
- The software failure incident in both articles primarily stemmed from software-related issues such as the failure to properly implement geo-blocking protocols, configuration problems, and the misidentification of normal traffic patterns as data exfiltration [48494, 49158]. |
Objective (Malicious/Non-malicious) |
non-malicious |
(a) The software failure incident was non-malicious. The incident was attributed to a distributed denial-of-service (DDoS) attack on the eCensus website, causing it to go offline for over 40 hours. IBM, the contractor responsible for developing and running the eCensus, stated that the failure was not due to a hack and that no personal information of participants had been compromised [48494, 49158].
The incident involved issues with geo-blocking protocols not being properly applied by internet service providers, leading to the website being overwhelmed by foreign traffic. IBM had anticipated and planned for DDoS attacks using geo-blocking protection, known as 'Island Australia,' but there were failures in its implementation. The incident also highlighted miscommunication and lack of coordination between IBM and its sub-contractors regarding security measures [48494, 49158]. |
Intent (Poor/Accidental Decisions) |
poor_decisions, accidental_decisions |
(a) The software failure incident was related to poor_decisions. IBM's handling of the eCensus website shutdown was attributed to poor decisions, such as not properly implementing geo-blocking protocols and failing to address risks adequately. The incident involved a distributed denial-of-service (DDoS) attack that overwhelmed the website, leading to its shutdown for over 40 hours. Despite IBM's claims of anticipating and planning for DDoS attacks, the failure to effectively implement geo-blocking and address configuration issues ultimately led to the costly shutdown [48494, 49158].
(b) The software failure incident also involved accidental_decisions. For example, IBM's senior engineer mentioned that a simple solution like power cycling the router could have potentially resolved the issue earlier, indicating that the failure could have been due to an unintended mistake in not trying this approach sooner. Additionally, there were instances where sub-contractors failed to properly implement geo-blocking despite being instructed to do so, leading to continued vulnerabilities in the system [49158]. |
Capability (Incompetence/Accidental) |
development_incompetence, accidental |
(a) The software failure incident occurred due to development incompetence. IBM, the company responsible for the Census website, faced criticism for the failure, with the managing director taking full responsibility for the meltdown. The incident was attributed to a distributed denial-of-service (DDoS) attack that overwhelmed the website, leading to it being offline for over 40 hours. Despite IBM's claims of anticipating and planning for DDoS attacks using geo-blocking, the incident highlighted failures in implementing proper security measures and protocols [48494, 49158].
(b) The software failure incident also had accidental contributing factors. For example, IBM's senior engineer mentioned that a simple solution like power cycling the router could have potentially solved the problem earlier, indicating that the issue might have been overlooked or not tested thoroughly during development. Additionally, there were instances where normal traffic patterns were falsely identified as data exfiltration, leading to misinterpretations and disruptions in the system [49158]. |
Duration |
temporary |
(a) The software failure incident was temporary. The eCensus website shutdown lasted over 40 hours as thousands tried to input their data [48494]. IBM was prepared to relaunch the website three hours after it failed, but the Australian Bureau of Statistics insisted it be kept offline for a further 40 hours [48494]. The incident was attributed to distributed denial-of-service (DDoS) attacks and issues with geo-blocking protocols not being applied by an internet service provider [48494]. IBM claimed to have successfully defended against further DDoS attacks on the site [48494].
(b) The software failure incident was temporary due to contributing factors introduced by certain circumstances but not all. IBM Australia managing senior engineer Michael Shallcross mentioned that turning the router's power 'off and on again' could have solved the problem earlier, indicating a specific technical issue that could have been addressed [49158]. |
Behaviour |
crash, omission, other |
(a) crash: The software failure incident in the articles can be categorized as a crash. The eCensus website experienced a meltdown on August 9, leading to it going offline for over 40 hours, preventing users from inputting their data [48494, 49158].
(b) omission: The incident can also be classified as an omission failure. Despite IBM's preparations for DDoS attacks using geo-blocking, the system omitted to effectively block the attack traffic, leading to the website becoming unresponsive [48494, 49158].
(c) timing: There is no specific indication in the articles that the software failure incident was related to timing issues.
(d) value: The incident does not align with a value failure where the system performs its intended functions incorrectly.
(e) byzantine: The incident does not exhibit characteristics of a byzantine failure where the system behaves erroneously with inconsistent responses and interactions.
(f) other: The other behavior observed in the incident is the failure due to the system falsely identifying normal traffic patterns as data exfiltration, leading to misinterpretation and subsequent issues [49158]. |