Incident: Emergency Service Calls Disrupted by Rare Software Failure at Orange

Published Date: 2021-06-03

Postmortem Analysis
Timeline 1. The software failure incident happened on June 2, 2021 [115478].
System 1. Critical network equipment responsible for routing all incoming calls from landline or cell phones to emergency numbers failed [115478].
Responsible Organization 1. Orange's boss, Stephane Richard, mentioned that the software failure incident was caused by a genuine, extremely rare technical glitch and not by an attack, malicious act, human error, or maintenance issue [115478].
Impacted Organization 1. Emergency phone services 2. Orange (telecoms firm) 3. French authorities 4. Samu-Urgences (central organization for emergency medical services) [115478]
Software Causes 1. The software failure was caused by a genuine, extremely rare technical glitch in critical network equipment responsible for routing all incoming calls to emergency numbers [115478].
Non-software Causes 1. The outage was not caused by an attack or a malicious act [115478]. 2. The outage could not be linked to a human error or a maintenance issue [115478].
Impacts 1. The software failure incident at Orange, France's biggest telecoms firm, prevented emergency service calls for several hours, potentially leading to three to four deaths during the period [115478]. 2. Approximately 20% of calls to emergency services failed on average, prompting the authorities to establish alternative emergency numbers and publicize them on social media [115478]. 3. Calls to fixed lines, beyond those made to emergency services, were also affected by the outage, as reported by French rivals SFR and Bouygues Telecom on Twitter [115478]. 4. A third of the calls made to emergency medical services failed, according to François Braun, the president of Samu-Urgences, the central organization for emergency medical services [115478].
Preventions 1. Implementing more robust redundancy measures for critical network equipment to prevent simultaneous failures [115478]. 2. Conducting regular and thorough testing of the software and infrastructure to identify and address potential technical glitches before they impact services [115478]. 3. Enhancing monitoring systems to quickly detect and respond to any anomalies or failures in the network equipment responsible for routing emergency calls [115478].
Fixes 1. Implementing a thorough investigation into the causes of the outage [115478] 2. Addressing the software failure in critical network equipment responsible for routing calls to emergency numbers 3. Ensuring the redundancy and reliability of the infrastructure to prevent simultaneous breakdowns in all redundant sites 4. Reviewing and potentially enhancing the security measures in place to prevent future software failures 5. Implementing alternative emergency numbers and communication strategies in case of similar incidents in the future
References 1. Orange boss Stephane Richard 2. French President Emmanuel Macron 3. Health minister Olivier Véran 4. Interior Minister Gerald Darmanin 5. Minister for digital affairs, Cedric O 6. Orange France chief Fabienne Dulac 7. French rivals SFR and Bouygues Telecom 8. François Braun, president of Samu-Urgences [115478]

Software Taxonomy of Faults

Category Option Rationale
Recurring unknown (a) The software failure incident having happened again at one_organization: The article mentions that Orange's boss, Stephane Richard, stated that the outage was caused by a software failure, describing it as a "genuine, extremely rare technical glitch" [115478]. This implies that the software failure incident was unique and not a recurring issue within Orange. (b) The software failure incident having happened again at multiple_organization: There is no information in the article indicating that a similar incident has happened before at other organizations or with their products and services.
Phase (Design/Operation) design (a) The software failure incident was attributed to a design issue rather than an operational one. Orange's boss, Stephane Richard, mentioned that the outage was caused by a "software failure" in critical network equipment responsible for routing emergency calls, emphasizing that it was a "genuine, extremely rare technical glitch" and not due to an attack, malicious act, human error, or maintenance issue [Article 115478].
Boundary (Internal/External) within_system From the provided article [115478], the software failure incident at Orange was described as a "genuine, extremely rare technical glitch" by Orange's boss, Stephane Richard. He mentioned that the failure was caused by a software failure in critical network equipment responsible for routing all incoming calls to emergency numbers. Richard clarified that the outage was not caused by an attack, malicious act, human error, or maintenance issue. He stated that all six redundant sites on which the infrastructure relies broke down at the same time for an unknown reason. This information suggests that the software failure incident was within_system, originating from within the system itself.
Nature (Human/Non-human) non-human_actions (a) The software failure incident at Orange was attributed to a non-human action. Orange's boss, Stephane Richard, clarified that the outage was not caused by an attack, malicious act, human error, or maintenance issue. Instead, he described it as a "genuine, extremely rare technical glitch" that occurred in critical network equipment responsible for routing emergency calls [115478].
Dimension (Hardware/Software) software The software failure incident at Orange, as reported in Article 115478, was attributed to a software failure. Orange's boss, Stephane Richard, explicitly mentioned that the outage was caused by a "software failure" and described it as a "genuine, extremely rare technical glitch" [115478]. Additionally, he clarified that the failure was not due to an attack, a malicious act, human error, or a maintenance issue [115478]. The software failure occurred in critical network equipment responsible for routing calls to emergency numbers, and all six redundant sites on which the infrastructure relies broke down simultaneously for an unknown reason [115478].
Objective (Malicious/Non-malicious) non-malicious The software failure incident at Orange, as reported in Article 115478, was determined to be non-malicious. Orange's boss, Stephane Richard, explicitly stated that the outage was not caused by an attack or a malicious act. He also ruled out human error or a maintenance issue as contributing factors to the failure. Instead, Richard attributed the incident to a "software failure," describing it as a "genuine, extremely rare technical glitch" that occurred in critical network equipment responsible for routing emergency calls [115478].
Intent (Poor/Accidental Decisions) accidental_decisions The software failure incident at Orange, as reported in Article 115478, was attributed to a genuine, extremely rare technical glitch in critical network equipment responsible for routing emergency calls. Orange's boss, Stephane Richard, mentioned that the failure was not caused by an attack, malicious act, human error, or maintenance issue. He specifically stated, "It's a software failure (...) a genuine, extremely rare technical glitch" [115478]. This information suggests that the intent of the software failure incident was more aligned with the option of accidental_decisions, where the failure was due to contributing factors introduced by mistakes or unintended decisions.
Capability (Incompetence/Accidental) accidental (a) The software failure incident in the article was not attributed to development incompetence. Orange's boss, Stephane Richard, mentioned that the outage was not caused by an attack, malicious act, human error, or maintenance issue. He described it as a "genuine, extremely rare technical glitch" that occurred in critical network equipment responsible for routing emergency calls [115478]. (b) The software failure incident was described as an accidental occurrence. Orange's boss, Stephane Richard, stated that the outage was not caused by intentional actions like an attack or malicious act. He emphasized that it was a software failure, a genuine technical glitch that occurred for an unknown reason, affecting all six redundant sites simultaneously [115478].
Duration temporary The software failure incident at Orange that caused the disruption to emergency phone services was temporary in nature. The outage began on Wednesday afternoon but eased overnight and was entirely fixed in the course of the day [115478]. The Orange boss mentioned that the outage was caused by a software failure, specifically in critical network equipment responsible for routing calls to emergency numbers. He stated that for an unknown reason, all six redundant sites on which the infrastructure relies broke down at the same time, indicating a temporary failure [115478].
Behaviour value, other (a) crash: The software failure incident in the article was not described as a crash where the system loses state and does not perform any of its intended functions [115478]. (b) omission: The software failure incident did not involve the system omitting to perform its intended functions at an instance(s) [115478]. (c) timing: The software failure incident did not involve the system performing its intended functions correctly, but too late or too early [115478]. (d) value: The software failure incident did involve the system performing its intended functions incorrectly, as it led to a disruption in emergency phone services, with 20% of calls to emergency services failing on average [115478]. (e) byzantine: The software failure incident did not involve the system behaving erroneously with inconsistent responses and interactions [115478]. (f) other: The software failure incident was described as a "software failure" that occurred in critical network equipment responsible for routing all incoming calls from landline or cell phones to emergency numbers. The failure was attributed to a genuine, extremely rare technical glitch where all six redundant sites on which the infrastructure relies broke down at the same time for an unknown reason [115478].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence death, theoretical_consequence (a) death: The article mentions that there were three to four deaths recorded during the period of the software failure incident, although it was noted that it was too soon to definitively link these deaths to the outage [115478]. (h) theoretical_consequence: The health minister mentioned that it was too soon to determine if there was any link between the outage and the deaths recorded during the period, indicating a potential theoretical consequence that deaths could have been caused by the software failure incident [115478].
Domain health The software failure incident reported in the news article [Article 115478] was related to the industry of telecommunications, specifically affecting emergency phone services provided by Orange, France's biggest telecoms firm. The incident caused a disruption in emergency calls for several hours, impacting the ability of individuals to reach emergency services during that period. Orange's boss, Stephane Richard, mentioned that the software failure occurred in critical network equipment responsible for routing all incoming calls from landline or cell phones to emergency numbers. The outage affected not only emergency calls but also some calls to fixed lines beyond emergency services, as reported by French rivals SFR and Bouygues Telecom on Twitter. Additionally, François Braun, the president of Samu-Urgences, highlighted that a third of the calls made to emergency medical services failed during the outage, emphasizing the impact on the health industry [Article 115478].

Sources

Back to List