Incident: Submarine Cable Fault Causes Major Internet Disruption for TPG.

Published Date: 2016-02-07

Postmortem Analysis
Timeline 1. The software failure incident happened in February 2016. Explanation: Step 1: The article mentions that the outage was originally expected to take more than a month to fix, with an estimated restoration date of March 7. It also states that the incident was flagged on Friday night before the article was published on February 7, 2016. Step 2: The article was published on 2016-02-07. Step 3: Based on the information provided, the incident likely occurred in February 2016.
System The software failure incident described in the article [40987] was related to a fault in the submarine cable system, specifically the PPC-1 cable, which runs between Guam and Sydney. The following systems/components failed: 1. PPC-1 submarine cable 2. Network Operations Centre monitoring system These failures resulted in a significant outage impacting TPG Telecom's international communications lifelines with Hong Kong, Japan, and the US.
Responsible Organization 1. TPG Telecom [40987]
Impacted Organization 1. TPG Telecom [40987]
Software Causes 1. unknown
Non-software Causes 1. The fault occurred in a 6,900-kilometre long PPC-1 submarine cable running between Guam and Sydney, which was hit by a fault [40987]. 2. The fault was due to a "fibre fault" located 4,652 kilometres from Guam and more than 2 kilometres beneath the water's surface [40987]. 3. Repair crews were held up on another job fixing the Basslink cable system between Victoria and Tasmania, causing extra delays in fixing the PPC-1 cable [40987].
Impacts 1. TPG Telecom customers could face a significant slowdown in their internet service due to the fault in the PPC-1 cable, impacting their connectivity to Hong Kong, Japan, and the US [40987]. 2. The entire cable system was knocked offline, affecting the transmission of IP traffic and potentially causing an increase in latency for specific Asian destinations [40987]. 3. Gamers may experience major speed bumps when trying to connect to overseas servers, disrupting their online gaming experience [40987]. 4. The incident highlighted the dependency of Australia's internet infrastructure on a limited number of communication lines, emphasizing the vulnerability of the system to such faults [40987].
Preventions 1. Implementing regular maintenance and monitoring of the submarine cables to detect and address potential faults before they escalate [40987]. 2. Having a more robust redundancy system in place to quickly reroute traffic in case of a cable fault [40987]. 3. Ensuring timely repairs by having backup plans or alternative vessels ready for maintenance work on critical submarine cables [40987].
Fixes 1. Repairing the fault in the 6,900-kilometre long PPC-1 cable between Guam and Sydney [40987]. 2. Rerouting traffic via two other international cables - the Southern Cross and the Australian Japan Cable system [40987]. 3. Considering other options for a faster repair using an alternative vessel due to delays in the maintenance ship contracted by TPG [40987].
References 1. TPG Telecom 2. Network Operations Centre 3. CNET 4. TPG spokesperson

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization a) The software failure incident related to TPG Telecom's major international submarine cable being hit by a fault is specific to TPG Telecom [40987]. This incident is unique to TPG Telecom and its infrastructure, indicating that a similar incident has not happened before within the same organization. b) The software failure incident involving a major international submarine cable fault is not mentioned to have happened again at other organizations or with their products and services in the provided article [40987].
Phase (Design/Operation) unknown The article does not mention any software failure incident related to the development phases, whether design or operation. Therefore, it is unknown whether the reported incident was caused by factors introduced during system development, system updates, or procedures to operate or maintain the system, or if it was due to factors introduced by the operation or misuse of the system.
Boundary (Internal/External) within_system (a) within_system: The software failure incident related to the submarine cable fault affecting TPG Telecom's international communication was primarily within the system. The fault was detected within the cable system itself, leading to the loss of payload and knocking the entire cable offline [40987]. The issue was specifically related to a "fibre fault" located 4,652 kilometres from Guam and more than 2 kilometres beneath the water's surface, indicating an internal system failure rather than an external factor.
Nature (Human/Non-human) non-human_actions (a) The software failure incident in this case was not directly attributed to non-human actions but rather to a fault in a major international submarine cable system owned by TPG Telecom. The fault in the 6,900-kilometre long PPC-1 cable, which runs between Guam and Sydney, was detected by the Network Operations Centre after receiving multiple alarms indicating that the cable had "lost its payload" [40987]. (b) The software failure incident was not caused by human actions but rather by a fault in the submarine cable system. The repair crews were already held up on another job, and the maintenance ship contracted by TPG to fix submarine cable faults was delayed due to repairs on the Basslink cable system between Victoria and Tasmania. This led to additional delays in fixing the PPC-1 cable, with TPG considering other options for a faster repair using an alternative vessel [40987].
Dimension (Hardware/Software) hardware (a) The software failure incident in this case is primarily related to hardware issues rather than software. The article mentions that a major fault occurred in a 6,900-kilometre long cable, specifically the PPC-1 cable, which runs between Guam and Sydney [40987]. The fault was described as a "fibre fault" located 4,652 kilometres from Guam and more than 2 kilometres beneath the water's surface. The issue was related to the physical hardware of the submarine cable system, indicating a hardware failure as the contributing factor. (b) There is no specific mention of the software being a contributing factor to the failure incident in the articles. The focus is primarily on the physical fault in the submarine cable system and the challenges faced in repairing it. Therefore, it can be inferred that the software did not play a significant role in this particular incident.
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident described in the article is non-malicious. It is related to a fault in a major international submarine cable owned by TPG Telecom, which caused a significant outage affecting the ISP's communications lifelines with Hong Kong, Japan, and the US. The fault was detected by the Network Operations Centre, and the entire cable was knocked offline, leading to a need for repairs that would take an extended period, with an estimated restoration date of more than a month after the initial alarm was raised [40987].
Intent (Poor/Accidental Decisions) The software failure incident reported in the article [40987] is related to a fault in a major international submarine cable owned by TPG Telecom. The incident was not caused by a software failure related to poor decisions or accidental decisions. Instead, the issue was a physical fault in the cable system, specifically a "fibre fault" located 4,652 kilometers from Guam and more than 2 kilometers beneath the water's surface. The fault led to the entire cable being knocked offline, impacting TPG's communication lifelines with various countries. The incident required repair crews to address the physical fault in the submarine cable, rather than being attributed to software-related poor or accidental decisions.
Capability (Incompetence/Accidental) unknown (a) The software failure incident in the article is not related to development incompetence. It is primarily a physical fault in a submarine cable system that has caused the outage, leading to significant disruptions for TPG Telecom customers [40987]. (b) The software failure incident in the article is not accidental. The fault in the submarine cable system was not accidental but rather a result of a physical issue with the cable, leading to a major outage for TPG Telecom customers [40987].
Duration temporary The software failure incident described in the article is related to a fault in a major international submarine cable owned by TPG Telecom, which has caused a significant outage affecting the ISP's services. The incident is not directly attributed to a software failure but rather a physical fault in the submarine cable system. The outage was expected to take more than a month to fix, with repair crews facing challenges due to the location and depth of the fault [40987]. Therefore, based on the information provided in the article, the duration of the incident is more aligned with a temporary failure caused by specific circumstances (physical fault in the submarine cable) rather than a permanent failure due to contributing factors introduced by all circumstances.
Behaviour crash, other (a) crash: The software failure incident in the article can be categorized as a crash. The article mentions that the cable system lost its payload, resulting in the entire cable being knocked offline, leading to a significant outage affecting TPG customers [40987]. (b) omission: The incident does not specifically mention a failure due to the system omitting to perform its intended functions at an instance(s). (c) timing: The incident does not specifically mention a failure due to the system performing its intended functions correctly, but too late or too early. (d) value: The incident does not specifically mention a failure due to the system performing its intended functions incorrectly. (e) byzantine: The incident does not specifically mention a failure due to the system behaving erroneously with inconsistent responses and interactions. (f) other: The behavior of the software failure incident in the article can be described as a major fault in the submarine cable system, leading to a significant slowdown for TPG customers and the need for rerouting traffic via alternative cables. The incident also highlights the vulnerability of Australia's internet infrastructure due to its dependence on a limited number of communication lines [40987].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property, delay, non-human, theoretical_consequence The consequence of the software failure incident described in the article is primarily related to a delay in services rather than any direct harm or loss of life. The fault in the submarine cable caused a significant slowdown for TPG Telecom customers, impacting their access to international communication lines with Hong Kong, Japan, and the US [40987]. The repair process was expected to take more than a month, leading to potential delays in connectivity and increased latency for specific Asian destinations [40987]. Additionally, TPG customers, especially gamers, may face difficulties connecting to overseas servers due to the outage [40987]. The incident underscores the dependence of Australia's internet infrastructure on a limited number of communication lines, emphasizing the potential disruptions that can occur when such critical systems fail [40987].
Domain information, utilities, other (a) The failed system was related to the information industry as it impacted TPG Telecom's major international submarine cable, causing a significant slowdown for TPG customers in terms of internet connectivity and communication [40987]. (g) The incident also affected the utilities industry as the submarine cable fault disrupted TPG's major communications lifelines with Hong Kong, Japan, and the US, impacting the provision of internet services and connectivity [40987]. (m) The incident could also be related to the "other" category as it involved the repair and maintenance of submarine cables that are crucial for international communication and data transmission, which may not fit directly into the predefined industry categories [40987].

Sources

Back to List