Recurring |
one_organization, multiple_organization |
(a) The software failure incident having happened again at one_organization:
- T-Mobile experienced a software failure incident that impacted the ability to make calls and send text messages, lasting for more than 12 hours [101094].
- T-Mobile's CEO mentioned that the outage was caused by an "IP traffic related issue that has created significant capacity issues in the network core" [101094].
- T-Mobile had a previous outage where the redundancy failed, resulting in an "overload" situation that affected VoLTE calls and text services [101094].
(b) The software failure incident having happened again at multiple_organization:
- Downdetector.com noted issues with all major wireless carriers, including AT&T, Verizon, T-Mobile, and Sprint [101094].
- AT&T and Verizon confirmed that their networks were operating normally, but there were issues when trying to text or call a T-Mobile phone [101094].
- Verizon criticized Downdetector for spreading false reports about its network performance, emphasizing that their network was not experiencing outages [101094]. |
Phase (Design/Operation) |
design, operation |
(a) The software failure incident related to the design phase was due to an "IP traffic related issue that has created significant capacity issues in the network core" as mentioned by T-Mobile CEO Mike Sievert in a blog post [101094]. This issue was caused by an "overload" situation resulting from a fiber circuit failure that T-Mobile leases from a third-party provider in the Southeast, where redundancies set up to handle such issues failed in this case [101094].
(b) The software failure incident related to the operation phase was evident in the outage experienced by T-Mobile customers, impacting their ability to make calls and send text messages for more than 12 hours. This outage was attributed to an "IP traffic storm" that spread across the IMS core network supporting VoLTE calls, affecting the operation of the network [101094]. |
Boundary (Internal/External) |
within_system, outside_system |
(a) The software failure incident related to the T-Mobile outage was primarily within the system. The outage was caused by an "IP traffic related issue that has created significant capacity issues in the network core" [101094]. Additionally, the outage was triggered by a fiber circuit failure that T-Mobile leases from a third-party provider in the Southeast, and the redundancy within T-Mobile's system failed, leading to an "overload" situation [101094]. The failure was not attributed to a DDoS attack but rather to internal network issues and failures within T-Mobile's infrastructure. |
Nature (Human/Non-human) |
non-human_actions, human_actions |
(a) The software failure incident was primarily caused by a non-human action, specifically a fiber circuit failure that T-Mobile leases from a third-party provider in the Southeast. This failure triggered an overload situation in the network core, leading to significant capacity issues affecting calls and texts [101094].
(b) Human actions were also involved in the response to the incident. T-Mobile's CEO Mike Sievert mentioned that the outage was caused by an "IP traffic related issue" and assured customers that hundreds of engineers and vendor partner staff were working to resolve the issue. Additionally, T-Mobile's president of technology, Neville Ray, provided updates on the situation and recommended alternative communication methods to users [101094]. |
Dimension (Hardware/Software) |
hardware, software |
(a) The software failure incident related to hardware:
- The outage was triggered by a fiber circuit failure that T-Mobile leases from a third-party provider in the Southeast, leading to an "overload" situation in the network core [101094].
- T-Mobile had redundancies set up to handle such issues, but in this case, the redundancy failed, exacerbating the problem [101094].
(b) The software failure incident related to software:
- T-Mobile CEO Mike Sievert mentioned that the outage was caused by an "IP traffic related issue that has created significant capacity issues in the network core" [101094].
- The overload resulted in an IP traffic storm across the IMS core network supporting VoLTE calls, indicating a software-related issue [101094]. |
Objective (Malicious/Non-malicious) |
non-malicious |
(a) The software failure incident in this case was non-malicious. The outage experienced by T-Mobile was caused by an "IP traffic related issue that has created significant capacity issues in the network core" [101094]. The issue stemmed from a fiber circuit failure that T-Mobile leases from a third-party provider in the Southeast, leading to an "overload" situation in the network core [101094].
Additionally, T-Mobile CEO Mike Sievert mentioned that the outage was not a result of a distributed denial-of-service (DDoS) attack [101094]. This indicates that the failure was not caused by malicious intent to harm the system but rather by technical issues and failures in the network infrastructure. |
Intent (Poor/Accidental Decisions) |
accidental_decisions |
(a) The software failure incident related to the T-Mobile outage was not primarily due to poor decisions but rather a fiber circuit failure that T-Mobile leases from a third-party provider in the Southeast. The redundancy set up to handle such issues failed, leading to an "overload" situation that caused significant capacity issues across the network core [101094]. The incident was attributed to an "IP traffic related issue" that created capacity problems in the network core, rather than poor decisions [101094]. |
Capability (Incompetence/Accidental) |
accidental |
(a) The software failure incident related to development incompetence is not explicitly mentioned in the provided article. Therefore, it is unknown whether the T-Mobile outage was due to factors introduced by lack of professional competence by humans or the development organization.
(b) The software failure incident related to accidental factors is evident in the article. The outage at T-Mobile was caused by an "IP traffic related issue that has created significant capacity issues in the network core" [101094]. This issue stemmed from a fiber circuit failure that T-Mobile leases from a third-party provider in the Southeast, leading to an overload situation that affected the network's core infrastructure. The redundancy set up to handle such issues failed in this case, resulting in the outage. |
Duration |
temporary |
(a) The software failure incident in this case was temporary. The T-Mobile service outage impacting calls and texts started shortly after 9 a.m. PT on Monday and ended more than 12 hours later at 10:03 p.m. PT the same day [101094]. The outage was caused by an "IP traffic related issue that has created significant capacity issues in the network core" and was triggered by a fiber circuit failure that T-Mobile leases from a third-party provider in the Southeast. The redundancy failed in handling the issue, leading to an "overload" situation affecting the network core supporting VoLTE calls [101094]. |
Behaviour |
crash, omission, value, other |
(a) crash: The software failure incident in the T-Mobile outage can be categorized as a crash. The incident led to a widespread issue impacting the ability to make calls and send text messages for more than 12 hours, indicating a failure of the system to perform its intended functions [101094].
(b) omission: The software failure incident can also be categorized as an omission. Users reported that calls and texts were not working, while data services appeared to be working normally. This indicates an omission in performing the intended functions of calls and texts [101094].
(c) timing: The software failure incident does not align with a timing failure as there is no indication that the system performed its intended functions too late or too early [101094].
(d) value: The software failure incident can be categorized as a value failure. The outage caused the system to perform its intended functions incorrectly, leading to significant capacity issues in the network core and affecting the ability to make calls and send text messages [101094].
(e) byzantine: The software failure incident does not align with a byzantine failure as there is no mention of inconsistent responses or interactions from the system [101094].
(f) other: The other behavior observed in the software failure incident is a redundancy failure. Despite having redundancies set up to handle issues like the fiber circuit failure, the redundancy failed in this case, leading to an overload situation and contributing to the network core capacity issues [101094]. |