Recurring |
one_organization |
(a) The software failure incident has happened again at one_organization:
The article [8231] reports that Research in Motion (RIM) experienced a significant software failure incident with its BlackBerry services. This incident involved a failure of one of RIM's core switches, followed by the failure of redundant systems, leading to a major outage affecting customers in Europe and the Americas. Despite regular testing of failover systems, the failover did not function as expected, resulting in a significant backlog of mail and service interruptions for customers. RIM is working to restore service and clear out the backlog of email messages.
(b) The software failure incident has happened again at multiple_organization:
There is no information in the provided article to suggest that a similar software failure incident has happened at other organizations or with their products and services. |
Phase (Design/Operation) |
design, operation |
(a) The software failure incident in the article was related to the design phase. The root cause of the initial European BlackBerry e-mail service outage was described as a failure of one of RIM's core switches, which then cascaded into a more significant issue when the redundant systems also failed to function as expected despite regular testing of failover systems [8231].
(b) The software failure incident in the article was also related to the operation phase. RIM responded to the outage by throttling service in the impacted area to stabilize service, which resulted in a backup of mail in other regions trying to reach RIM's European customers. This operational decision led to delays and service interruptions for many customers [8231]. |
Boundary (Internal/External) |
within_system |
(a) within_system: The software failure incident involving BlackBerry services was primarily within the system. The root cause was identified as a failure of one of RIM's core switches, which then cascaded into a more significant issue when the redundant systems also failed to function as expected. This internal failure led to a backlog of mail and service interruptions for customers [8231].
(b) outside_system: There is no evidence in the article to suggest that the software failure incident was caused by contributing factors originating from outside the system. The focus of the incident was on internal system failures and the efforts to restore service within the company's infrastructure [8231]. |
Nature (Human/Non-human) |
non-human_actions |
(a) The software failure incident in the article was primarily due to non-human actions. The root cause of the initial European BlackBerry e-mail service outage was described as a failure of one of RIM's core switches, followed by the failure of redundant systems despite regular testing of failover systems [8231].
(b) Human actions were not mentioned as contributing factors to the software failure incident reported in the article. |
Dimension (Hardware/Software) |
hardware |
(a) The software failure incident was primarily attributed to hardware issues. The initial outage was described as a failure of one of RIM's core switches, indicating a hardware-related problem. Additionally, the redundant systems also failed to function as expected, further emphasizing the hardware aspect of the failure [8231].
(b) While the incident involved software components such as failover systems, the root cause and primary contributing factors were related to hardware issues, specifically the failure of core switches and redundant systems [8231]. |
Objective (Malicious/Non-malicious) |
non-malicious |
(a) The software failure incident described in the article does not indicate any malicious intent. There is no evidence of a hack or security breach involved in the outage. The root cause was identified as a failure of one of RIM's core switches, followed by a failure of redundant systems, leading to a significant backlog of mail. RIM's CTO for Software mentioned that the failover did not function as expected, despite regular testing of failover systems. The company is focused on restoring service and clearing out the backlog of email messages without dropping any messages [8231].
(b) The software failure incident is categorized as non-malicious. |
Intent (Poor/Accidental Decisions) |
unknown |
(a) The software failure incident described in the article does not explicitly mention poor decisions as a contributing factor. The root cause of the initial European BlackBerry e-mail service outage was identified as a failure of one of RIM's core switches, followed by the failure of redundant systems. The article highlights that the failover systems did not function as expected, despite regular testing. This indicates a technical failure rather than poor decisions as the primary cause of the incident [8231].
(b) The incident does not provide specific details indicating that the failure was due to accidental decisions or mistakes. The focus of the article is on technical issues such as the failure of core switches and redundant systems, as well as the challenges faced in restoring service and clearing out the backlog of emails. Therefore, the software failure incident appears to be more related to technical failures rather than accidental decisions [8231]. |
Capability (Incompetence/Accidental) |
accidental |
(a) The software failure incident in Article 8231 was not explicitly attributed to development incompetence. The root cause of the initial European BlackBerry e-mail service outage was described as a failure of one of RIM's core switches, with subsequent issues arising from the failure of redundant systems. The CTO mentioned that the failover did not function as expected, despite regular testing of failover systems. This indicates a technical failure rather than incompetence in development [8231].
(b) The software failure incident in Article 8231 was more aligned with an accidental failure rather than intentional. The article mentions that there was no evidence of a hack or security breach involved in the outage. The issues seemed to have stemmed from technical failures within RIM's systems, such as the core switch failure and subsequent problems with redundant systems. The CTO mentioned that the failover did not work as expected, leading to a significant backlog of mail. This points towards accidental technical failures rather than intentional actions [8231]. |
Duration |
temporary |
(a) The software failure incident described in the article was temporary. It was not a permanent failure as the article mentions that RIM was working "around the clock" to try and restore service [8231]. Additionally, the article states that RIM was focusing on clearing out the backlog of email, indicating efforts to resolve the issue and restore normal service [8231]. |
Behaviour |
crash, other |
(a) crash: The software failure incident described in the article can be categorized as a crash. It mentions that the initial outage was caused by a failure of one of RIM's core switches, which led to a cascading effect when the redundant systems also failed to function as expected, resulting in a significant backup of mail and service interruptions for customers [8231].
(b) omission: The incident does not specifically mention a failure due to the system omitting to perform its intended functions at an instance(s).
(c) timing: The incident does not describe a failure due to the system performing its intended functions correctly, but too late or too early.
(d) value: The incident does not indicate a failure due to the system performing its intended functions incorrectly.
(e) byzantine: The incident does not suggest a failure due to the system behaving erroneously with inconsistent responses and interactions.
(f) other: The behavior of the software failure incident can be categorized as a cascading effect where the initial failure of a core switch led to the failure of redundant systems, causing a significant backlog of mail and service interruptions for customers. This cascading effect is a notable aspect of the incident [8231]. |