Incident: Massive Swisscom Telecom Outage Due to Software Malfunction.

Published Date: 2021-07-13

Postmortem Analysis
Timeline 1. The software failure incident happened in the early hours of Friday morning [116655]. **Answer:** The software failure incident happened in July 2021.
System 1. Swisscom's fixed-line network system [116655]
Responsible Organization 1. Swisscom (SCMN.S) [Article 116655]
Impacted Organization 1. Swisscom (SCMN.S) [Article 116655]
Software Causes 1. The software failure incident was caused by a software malfunction during maintenance work on Swisscom's telephony platform for business customers, leading to a nationwide failure of the fixed-line network in Switzerland [116655].
Non-software Causes 1. The failure incident was caused by a massive outage of Swisscom's services, specifically the fixed-line network, which prevented Swiss callers from contacting emergency services for eight hours [Article 116655]. 2. The outage occurred during maintenance work on the telephony platform for business customers, indicating a potential operational cause [Article 116655].
Impacts 1. The software failure incident led to a massive outage of Swisscom's services, preventing Swiss callers from contacting emergency services for eight hours [116655]. 2. Part-time firefighters were called up to their stations to help in emergencies due to the failure of the entire fixed-line phone network in Switzerland [116655].
Preventions 1. Implementing thorough testing procedures before deploying software updates could have potentially prevented the software failure incident [116655].
Fixes 1. Implement more rigorous testing procedures for software updates to catch potential malfunctions before deployment [116655]. 2. Enhance redundancy and failover mechanisms within the telephony platform to mitigate the impact of individual failures [116655].
References 1. Swisscom Chief Executive Urs Schaeppi - provided information about the software malfunction and issued an apology [116655].

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident happened again at one_organization: The article [116655] reports that Swisscom experienced a nationwide failure of its fixed-line network due to a software malfunction during maintenance work on its telephony platform for business customers. This incident was not the first for Swisscom, as it followed three similar problems in 2020. Swisscom's Chief Executive Urs Schaeppi apologized for the outage and acknowledged that this failure was not the first of its kind for the company. (b) The software failure incident happened again at multiple_organization: There is no information in the provided article to suggest that similar software failure incidents have occurred at other organizations.
Phase (Design/Operation) design (a) The software failure incident in the article was attributed to a software malfunction during maintenance work on Swisscom's telephony platform for business customers. The CEO mentioned that a software update led to a malfunction that triggered a domino effect, causing the collapse of the fixed-line phone network for Switzerland [116655]. This aligns with the design phase failure due to contributing factors introduced by system development or updates. (b) The article does not provide specific information indicating that the software failure incident was due to factors introduced by the operation or misuse of the system.
Boundary (Internal/External) within_system (a) The software failure incident described in the article is categorized as within_system. The CEO of Swisscom, Urs Schaeppi, mentioned that the collapse was caused by a software malfunction during maintenance work on the telephony platform for business customers. He specifically stated, "A software update led to a malfunction that triggered a domino effect" [116655]. This indicates that the failure originated from within the system during the maintenance process.
Nature (Human/Non-human) non-human_actions (a) The software failure incident in the article was attributed to a non-human action, specifically a software malfunction during maintenance work on the telephony platform for business customers. Swisscom's Chief Executive mentioned that a software update caused a malfunction that triggered a domino effect, leading to the collapse of the fixed-line phone network [116655]. (b) The article does not mention any specific human actions contributing to the software failure incident.
Dimension (Hardware/Software) software (a) The software failure incident was attributed to a software malfunction caused by a software update during maintenance work on Swisscom's telephony platform for business customers. Swisscom's Chief Executive Urs Schaeppi mentioned that a software update led to a malfunction, triggering a domino effect that resulted in the collapse of the fixed-line phone network for Switzerland [116655].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident described in the article was non-malicious. The CEO of Swisscom, Urs Schaeppi, apologized for the massive outage of the telecoms company's services, stating that the collapse was caused by a software malfunction during maintenance work on its telephony platform for business customers. Schaeppi mentioned that a software update led to a malfunction that triggered a domino effect, emphasizing that stability is their top priority and individual failures can occur, but they are unfortunately not preventable [116655].
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident mentioned in the article was caused by a software malfunction during maintenance work on Swisscom's telephony platform for business customers. Swisscom's Chief Executive Urs Schaeppi mentioned that a software update led to a malfunction that triggered a domino effect, resulting in the nationwide failure of the fixed-line network [116655]. This indicates that the failure was due to poor decisions related to the software update process.
Capability (Incompetence/Accidental) accidental (a) The software failure incident was not explicitly attributed to development incompetence in the provided article. (b) The software failure incident was described as being caused by a software malfunction during maintenance work on the telephony platform for business customers. Swisscom's CEO mentioned that a software update led to a malfunction that triggered a domino effect, resulting in the nationwide failure of the fixed-line phone network. This indicates that the failure was more likely accidental rather than due to development incompetence [116655].
Duration temporary The software failure incident reported in Article 116655 was temporary. The failure was caused by a software malfunction during maintenance work on the telephony platform for business customers, leading to a nationwide outage of Swisscom's fixed-line network for eight hours. The CEO mentioned that a software update triggered a domino effect, indicating that the failure was due to specific circumstances related to the maintenance work and not a permanent issue inherent in the system [116655].
Behaviour crash, omission, value, other (a) crash: The software failure incident was described as a crash as it caused a nationwide failure of Swisscom's fixed-line network, leading to emergency services being unable to be contacted for eight hours [116655]. (b) omission: The software failure incident could be considered an omission as the system failed to perform its intended function of maintaining the telephony platform for business customers during maintenance work, leading to a massive outage [116655]. (c) timing: The software failure incident does not align with a timing failure as there is no indication that the system performed its intended functions too late or too early [116655]. (d) value: The software failure incident could be categorized as a value failure as the software update led to a malfunction that caused the system to perform its intended functions incorrectly, triggering a domino effect and resulting in the outage [116655]. (e) byzantine: The software failure incident does not exhibit characteristics of a byzantine failure as there is no mention of inconsistent responses or interactions by the system [116655]. (f) other: The other behavior exhibited by the software failure incident is a cascading failure caused by a software malfunction during maintenance work, leading to a nationwide outage of the fixed-line phone network in Switzerland [116655].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence harm, delay, theoretical_consequence The consequence of the software failure incident reported in Article 116655 was a potential harm to people. The article mentions that due to the massive outage of Swisscom's services, emergency services could not be contacted by Swiss callers for eight hours. Part-time firefighters were called up to their stations to help in emergencies during the outage, indicating a potential risk to people's safety [116655].
Domain information (a) The failed system was intended to support the industry of information as it was related to the telecoms company's services, specifically the fixed-line phone network in Switzerland [Article 116655].

Sources

Back to List