Incident: HPE's User Data Repository System Failure at Megafon, Russia.

Published Date: 2018-11-06

Postmortem Analysis
Timeline 1. The software failure incident involving major network outages due to HPE's User Data Repository system occurred in November 2016, April 2017, and May 2017 as per the lawsuit filed by Megafon [77771].
System 1. HPE's User Data Repository (UDR) system [Article 77771]
Responsible Organization 1. Hewlett Packard Enterprise's Russia division was responsible for causing the software failure incident at Megafon, as alleged in the lawsuit filed by Megafon [Article 77771].
Impacted Organization 1. Megafon, Russia's No. 2 mobile phone operator, was impacted by the software failure incident [Article 77771].
Software Causes 1. The software cause of the failure incident was the User Data Repository (UDR) system built by HPE's Russia division, which suffered "numerous cascading catastrophic failures" leading to major network outages [Article 77771].
Non-software Causes 1. Contractual issues between Megafon and Hewlett Packard Enterprise (HPE) regarding the upgrade of Megafon's wireless network [Article 77771]. 2. Faulty system built by HPE to store user data and services, leading to repeated failures and major network outages [Article 77771]. 3. Cascading catastrophic failures in HPE's User Data Repository (UDR) system, causing near shutdowns of Megafon's cellular network [Article 77771]. 4. Decision by Megafon to upgrade its wireless network for LTE communications and unite its regional networks into a single federal network [Article 77771].
Impacts 1. Major network outages affecting Megafon's cellular network with over 80 million subscribers [Article 77771]. 2. Financial damages incurred by Megafon, including funds paid to HPE, repair costs, and lost customers [Article 77771]. 3. The need for Megafon to completely rebuild its User Data Repository (UDR) system at an estimated cost of more than $28 million [Article 77771].
Preventions 1. Proper testing and quality assurance procedures during the development and implementation of the User Data Repository (UDR) system by HPE's Russia division could have potentially prevented the software failure incident [77771]. 2. Regular monitoring and maintenance of the UDR system by HPE to identify and address any potential issues before they lead to major network outages could have helped prevent the failures experienced by Megafon [77771]. 3. Clear communication and collaboration between Megafon and HPE throughout the project implementation to ensure that any emerging issues or concerns were addressed promptly and effectively could have mitigated the risk of catastrophic failures in the UDR system [77771].
Fixes 1. Implementing a robust and reliable User Data Repository (UDR) system that does not suffer from cascading catastrophic failures [77771].
References 1. Megafon - The articles gather information about the software failure incident from Megafon, the Russia's No. 2 mobile phone operator, which is suing Hewlett Packard Enterprise over major outages on its cellular network [77771].

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident at Megafon involving the User Data Repository (UDR) system built by Hewlett Packard Enterprise (HPE) led to major network outages in November 2016, April 2017, and May 2017 [Article 77771]. This indicates that the software failure incident happened again within the same organization, Megafon. (b) There is no specific information in the provided article about the software failure incident happening at other organizations or with their products and services.
Phase (Design/Operation) design (a) The software failure incident in the article is related to the design phase. Megafon is suing Hewlett Packard Enterprise (HPE) over major outages on its cellular network, alleging that the system HPE built to store user data and services repeatedly failed, leading to network outages. The lawsuit mentions that HPE's User Data Repository (UDR) system suffered "numerous cascading catastrophic failures" which resulted in near shutdowns of Megafon's cellular network of more than 80 million subscribers [77771]. This indicates that the failure was due to contributing factors introduced during the design and development of the system by HPE. (b) The article does not provide information about the software failure incident being related to the operation phase.
Boundary (Internal/External) within_system (a) The software failure incident involving Megafon's cellular network outages was primarily within the system. The lawsuit alleges that HPE's User Data Repository (UDR) system, built to store user data and services, suffered "numerous cascading catastrophic failures" leading to major network outages [Article 77771]. This indicates that the failure originated from within the system itself, specifically from the faulty UDR system developed by HPE.
Nature (Human/Non-human) non-human_actions (a) The software failure incident in this case was primarily attributed to non-human actions. Megafon alleged that the User Data Repository (UDR) system built by HPE suffered "numerous cascading catastrophic failures," leading to major network outages in November 2016, April 2017, and May 2017. These failures resulted in near shutdowns of Megafon's cellular network with more than 80 million subscribers. The lawsuit mentioned that the system built by HPE to store user data and services repeatedly failed, indicating a technical failure rather than a failure caused by human actions [77771]. (b) There is no specific mention in the article of the software failure incident being caused by contributing factors introduced by human actions. The focus of the lawsuit and the allegations made by Megafon primarily revolved around the failures of the system built by HPE, indicating a technical failure rather than one caused by human actions [77771].
Dimension (Hardware/Software) hardware, software (a) The software failure incident in the article is related to hardware. Megafon, a mobile phone operator, sued Hewlett Packard Enterprise (HPE) over major outages on its cellular network. The lawsuit alleges that HPE's User Data Repository (UDR) system, built to store user data and services, suffered "numerous cascading catastrophic failures," leading to network outages. These failures resulted in near shutdowns of Megafon's cellular network of more than 80 million subscribers. Megafon claimed damages including repair costs and the need to completely rebuild its UDR system at an estimated cost of more than $28 million [77771]. (b) The software failure incident is also related to software. The lawsuit alleges that the system HPE built to store user data and services repeatedly failed, leading to major network outages. Megafon mentioned that the failures were due to the faulty system built by HPE, which failed to allow Megafon to unite its regional networks as intended. This indicates that the software component of the system was a contributing factor to the failure incident [77771].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident reported in Article 77771 was non-malicious. The failure was attributed to the system built by HPE's Russia division to store user data and services, which repeatedly failed, leading to major network outages for Megafon's cellular network. The lawsuit filed by Megafon against HPE highlighted that the User Data Repository (UDR) system suffered "numerous cascading catastrophic failures" in November 2016, April 2017, and May 2017, which resulted in near shutdowns of Megafon's cellular network with more than 80 million subscribers. The damages claimed by Megafon included funds paid to HPE, repair costs, lost customers, and the need to completely rebuild its UDR system at an estimated cost of more than $28 million [77771].
Intent (Poor/Accidental Decisions) poor_decisions (a) The intent of the software failure incident related to poor_decisions: - Megafon sued Hewlett Packard Enterprise (HPE) in a California court over major outages on its cellular network, alleging that HPE's Russia division signed a deal to upgrade Megafon's wireless network, but the system built by HPE repeatedly failed, leading to catastrophic failures and major network outages [Article 77771]. (b) The intent of the software failure incident related to accidental_decisions: - The article does not specifically mention that the software failure incident was due to accidental decisions or unintended mistakes.
Capability (Incompetence/Accidental) development_incompetence (a) The software failure incident in the article is related to development incompetence. Megafon is suing Hewlett Packard Enterprise (HPE) over major outages on its cellular network, alleging that HPE's Russia division signed a deal in 2013 to upgrade Megafon's wireless network, but the system HPE built to store user data and services repeatedly failed, leading to catastrophic failures and major network outages [Article 77771]. This indicates that the failure was due to contributing factors introduced by HPE's development incompetence in building the User Data Repository (UDR) system.
Duration temporary The software failure incident reported in Article 77771 was temporary. The failure occurred due to the User Data Repository (UDR) system built by Hewlett Packard Enterprise (HPE) suffering "numerous cascading catastrophic failures," leading to major network outages in November 2016, April 2017, and May 2017 [77771]. This indicates that the failure was not permanent but occurred intermittently due to specific circumstances related to the UDR system.
Behaviour crash, omission, value, other (a) crash: The software failure incident in the article involved major network outages on Megafon's cellular network due to the User Data Repository (UDR) system built by HPE suffering "numerous cascading catastrophic failures" [Article 77771]. (b) omission: The system built by HPE to store user data and services for Megafon repeatedly failed, leading to major network outages and near shutdowns of Megafon's cellular network with more than 80 million subscribers [Article 77771]. (c) timing: There is no specific mention of timing-related failures in the article. (d) value: The software failure incident involved the system built by HPE performing its intended functions incorrectly, resulting in significant damages for Megafon, including funds paid to HPE, repair costs, lost customers, and the need to completely rebuild the UDR system at an estimated cost of more than $28 million [Article 77771]. (e) byzantine: There is no indication of the software failure incident exhibiting byzantine behavior in the article. (f) other: The software failure incident also involved the system built by HPE failing to allow Megafon to unite its regional networks as intended, leading to the need for a complete rebuild of the faulty system [Article 77771].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property, non-human, theoretical_consequence (a) unknown (b) unknown (c) unknown (d) Megafon alleged damages included funds paid to HPE, repair costs, lost customers, and the need to completely rebuild its User Data Repository (UDR) system at an estimated cost of more than $28 million [77771]. (e) unknown (f) Megafon's cellular network of more than 80 million subscribers was impacted by major network outages due to the failures in HPE's User Data Repository (UDR) system [77771]. (g) unknown (h) The potential consequences discussed in the article include the lawsuit filed by Megafon against Hewlett Packard Enterprise (HPE) for the major outages on its cellular network, with the damages claimed to total more than $200 million [77771]. (i) unknown
Domain information (a) The failed system was intended to support the information industry as it was related to Megafon's wireless network, which is crucial for communication and data services for its subscribers [77771].

Sources

Back to List