Published Date: 2022-07-09
Postmortem Analysis | |
---|---|
Timeline | 1. The software failure incident at Rogers Communications happened early Friday morning, as mentioned in the article [130161]. 2. The article was published on July 9, 2022. 3. Estimation: The incident occurred on July 8, 2022. |
System | 1. Network system failure following a maintenance update in the core network, causing some routers to malfunction [130161]. |
Responsible Organization | 1. Rogers Communications was responsible for causing the software failure incident as stated by their Chief Executive Officer Tony Staffieri [130161]. |
Impacted Organization | 1. Millions of Canadians, including those relying on banking, transport, and government services [130161] 2. Toronto-Dominion Bank, Bank Of Montreal, and Royal Bank of Canada, as their services were disrupted [130161] 3. Canada's border services agency, as their mobile app for incoming travelers was affected [130161] 4. Police across Canada, as some callers could not reach emergency services via 911 calls [130161] |
Software Causes | 1. The software cause of the failure incident was a network system failure following a maintenance update in Rogers Communications' core network, which caused some routers to malfunction [130161]. |
Non-software Causes | 1. The failure incident was caused by a network system failure following a maintenance update in the core network, leading to router malfunctions [130161]. 2. The outage was attributed to a router malfunction after maintenance work conducted by Rogers Communications [130161]. |
Impacts | 1. The software failure incident at Rogers Communications caused a massive outage that shut down banking, transport, and government access for millions of people in Canada [130161]. 2. Canadians had to resort to crowded cafes, public libraries, and hotels to access the internet as a result of the outage [130161]. 3. The outage affected the mobile app for incoming travelers of the country's border services agency, and cashless payment systems stopped working [130161]. 4. Police across Canada reported that some callers were unable to reach emergency services via 911 calls during the outage [130161]. 5. Financial institutions and banks such as Toronto-Dominion Bank, Bank Of Montreal, and Royal Bank of Canada experienced disruptions in their services due to the outage [130161]. |
Preventions | 1. Implementing thorough testing procedures before deploying maintenance updates to the core network could have potentially prevented the software failure incident [130161]. 2. Investing in more robust and redundant network infrastructure to mitigate the impact of router malfunctions or failures could have helped prevent such widespread outages [130161]. 3. Diversifying the telecommunication market in Canada by promoting more competition among providers could lead to better service reliability and resilience against major outages [130161]. |
Fixes | 1. Implementing more rigorous testing procedures before deploying maintenance updates to prevent network system failures like the one experienced by Rogers Communications [130161]. 2. Enhancing redundancy and failover mechanisms in the network infrastructure to mitigate the impact of router malfunctions or other similar issues [130161]. 3. Investing in network and technology upgrades to ensure a more stable and resilient system that can handle maintenance updates without causing widespread outages [130161]. | References | 1. Rogers Communications (RCIb.TO) [Article 130161] 2. Rogers' Chief Executive Officer Tony Staffieri [Article 130161] 3. Canadians affected by the outage [Article 130161] 4. Financial institutions and banks, including Toronto-Dominion Bank (TD.TO) and Bank Of Montreal (BMO.TO) [Article 130161] 5. Royal Bank of Canada (RY.TO) [Article 130161] 6. Ericsson (ERICb.ST) [Article 130161] 7. Canada's competition bureau [Article 130161] |
Category | Option | Rationale |
---|---|---|
Recurring | one_organization | (a) The software failure incident having happened again at one_organization: The article mentions that last year, Rogers attributed a major outage to a glitch linked to an Ericsson software upgrade. This indicates that a similar incident has happened before within the same organization, Rogers Communications [130161]. (b) The software failure incident having happened again at multiple_organization: There is no specific mention in the article about similar incidents happening at other organizations. Therefore, it is unknown if the software failure incident has occurred again at multiple organizations. |
Phase (Design/Operation) | design, operation | (a) The software failure incident in the article was attributed to a network system failure following a maintenance update in the core network, which caused some routers to malfunction early Friday morning. This points to a failure due to contributing factors introduced by system development or system updates [130161]. (b) The article mentions that the outage affected various services such as banking, transport, government access, mobile apps, and emergency services like 911 calls. This indicates a failure due to contributing factors introduced by the operation or misuse of the system [130161]. |
Boundary (Internal/External) | within_system | (a) within_system: The software failure incident at Rogers Communications was attributed to a network system failure following a maintenance update in their core network, causing some routers to malfunction [130161]. This indicates that the contributing factors leading to the failure originated from within the system itself. |
Nature (Human/Non-human) | non-human_actions | (a) The software failure incident in this case was attributed to a router malfunction following a maintenance update in the core network, causing some routers to malfunction [130161]. This indicates a non-human action as the contributing factor to the failure. (b) The article does not provide specific information about human actions contributing to the software failure incident. |
Dimension (Hardware/Software) | hardware, software | (a) The software failure incident was attributed to a hardware issue, specifically a router malfunction after maintenance work in the core network at Rogers Communications. The CEO mentioned that the cause was a network system failure following a maintenance update, leading to router malfunctions [130161]. (b) The previous year, Rogers experienced a major outage attributed to a glitch linked to an Ericsson software upgrade. Ericsson, the software provider, acknowledged the outage and was working with Rogers to restore service [130161]. |
Objective (Malicious/Non-malicious) | non-malicious | (a) The software failure incident reported in the article is categorized as non-malicious. The outage at Rogers Communications was attributed to a router malfunction following a maintenance update in the core network, causing some routers to malfunction early Friday morning. Rogers' Chief Executive Officer mentioned that the cause was a network system failure, not indicating any malicious intent behind the incident [130161]. |
Intent (Poor/Accidental Decisions) | poor_decisions | (a) The software failure incident at Rogers Communications was attributed to a network system failure following a maintenance update in the core network, which caused some routers to malfunction [130161]. This indicates that the incident could be related to poor_decisions, as the failure was a result of decisions made during the maintenance work that led to the malfunction. |
Capability (Incompetence/Accidental) | development_incompetence, accidental | (a) The software failure incident reported in the article was attributed to a network system failure following a maintenance update in the core network, which caused some routers to malfunction. This indicates a failure due to contributing factors introduced due to lack of professional competence by humans or the development organization [130161]. (b) The outage at Rogers Communications was blamed on a router malfunction after maintenance work, indicating that the incident was accidental in nature [130161]. |
Duration | temporary | The software failure incident reported in Article 130161 was temporary. The outage experienced by Rogers Communications was attributed to a network system failure following a maintenance update in the core network, causing some routers to malfunction early Friday morning. This indicates that the failure was due to specific circumstances related to the maintenance work and not a permanent issue [130161]. |
Behaviour | crash, omission, other | (a) crash: The software failure incident in the article was characterized by a crash as it caused a massive outage that shut down banking, transport, and government access for millions of people [130161]. (b) omission: The software failure incident also involved omission as the outage disrupted services of financial institutions and banks, such as Toronto-Dominion Bank and Bank Of Montreal, affecting their operations [130161]. (c) timing: The timing of the software failure incident was not explicitly mentioned in the article. (d) value: The software failure incident did not involve the system performing its intended functions incorrectly. (e) byzantine: The software failure incident did not exhibit behaviors of inconsistency or erratic responses. (f) other: The other behavior observed in the software failure incident was a network system failure following a maintenance update in the core network, causing some routers to malfunction, leading to the outage [130161]. |
Layer | Option | Rationale |
---|---|---|
Perception | None | None |
Communication | None | None |
Application | None | None |
Category | Option | Rationale |
---|---|---|
Consequence | property, delay, non-human, theoretical_consequence | (a) death: There is no mention of any deaths resulting from the software failure incident in the provided article [130161]. (b) harm: The article does not mention any physical harm caused to individuals due to the software failure incident [130161]. (c) basic: The software failure impacted people's access to services such as banking, transport, government access, and emergency services, but there is no specific mention of people's access to food or shelter being impacted [130161]. (d) property: Financial institutions and banks, including Toronto-Dominion Bank and Bank Of Montreal, mentioned that their services were disrupted due to the outage, and Royal Bank of Canada stated that its ATMs and online banking services were affected, indicating an impact on people's material goods and financial transactions [130161]. (e) delay: The outage caused delays and disruptions in various services such as banking, transport, government access, and emergency services, as mentioned in the article [130161]. (f) non-human: Non-human entities such as the country's border services agency's mobile app for incoming travelers and cashless payment systems were affected by the outage, indicating an impact on non-human entities [130161]. (g) no_consequence: The outage had significant consequences on various services and entities, so it does not fall under the category of no real observed consequences [130161]. (h) theoretical_consequence: The article discusses potential consequences such as the need for more competition in the telecom sector, criticism over the company's industrial dominance, and the impact on the merger deal between Rogers and Shaw Communications, which are theoretical consequences that were discussed but did not directly result from the software failure incident [130161]. (i) other: There are no other specific consequences mentioned in the article that do not fall under the options provided [130161]. |
Domain | information, finance, government | (a) The software failure incident affected the production and distribution of information as it caused a massive outage at Rogers Communications, a major telecom operator in Canada. The outage disrupted services for millions of people, including banking, transport, government access, and mobile apps for incoming travelers [Article 130161]. (h) Financial institutions and banks, such as Toronto-Dominion Bank, Bank Of Montreal, and Royal Bank of Canada, reported that their services were disrupted by the outage. ATMs and online banking services were affected, highlighting the impact on the finance industry [Article 130161]. (l) The outage also impacted government services, with the country's border services agency mentioning that its mobile app for incoming travelers was affected, and police across Canada reported issues with some callers not being able to reach emergency services via 911 calls [Article 130161]. |
Article ID: 130161