Recurring |
one_organization, multiple_organization |
(a) The software failure incident having happened again at one_organization:
1. Facebook experienced multiple outages over the years due to various issues such as server errors, configuration system changes, and read-only errors caused by its engineers [32754].
2. In 2010, Facebook suffered a disruption due to a networking problem caused by its engineers, which was resolved by turning off and on the site [32754].
(b) The software failure incident having happened again at multiple_organization:
1. The outage that affected Facebook also impacted other services like Instagram, Tinder, AOL messenger, and Hipchat, which rely on Facebook for logins [32754].
2. Skype was one of the high-profile services affected by Facebook's outage in August 2014, indicating that the incident had repercussions on other organizations as well [32754]. |
Phase (Design/Operation) |
design, operation |
(a) The software failure incident related to the design phase can be seen in the article [32754] where on 27 January 2015, Facebook experienced a 50-minute outage caused by a change introduced within its systems that affected the configuration systems. This change led to trouble accessing Facebook and Instagram, impacting other services like Tinder, AOL messenger, and Hipchat that rely on Facebook for logins. This outage was not a result of a cyber attack but a consequence of a change made during system development.
(b) The software failure incident related to the operation phase is evident in the article [32754] where on 21 October 2013, Facebook faced a "read-only error" that prevented users from posting status updates for more than four hours. This issue, not a complete outage but a disruption in operation, was caused by a network maintenance problem introduced during the operation of the system. |
Boundary (Internal/External) |
within_system |
(a) within_system:
- The software failure incident on 27 January 2015 was caused by Facebook attempting to change something within its systems which went wrong, not a cyber attack as widely reported [32754].
- On 21 October 2013, Facebook experienced a "read-only error" caused by its engineers during network maintenance, preventing users from posting status updates for more than four hours [32754].
- In 2010, Facebook suffered a disruption due to a fiendishly complex networking problem caused by its engineers, which was resolved by turning the site off and then on again [32754].
(b) outside_system:
- The article does not provide specific incidents or details indicating software failure incidents caused by contributing factors originating from outside the system. |
Nature (Human/Non-human) |
non-human_actions, human_actions |
(a) The software failure incident occurring due to non-human actions:
- The outage on 27 January 2015 was caused by Facebook attempting to change something within its systems which went wrong, not a cyber attack as widely reported. It was stated that this was not the result of a third-party attack but instead occurred after a change that affected the configuration systems [32754].
- The outage on 1 August 2014 was caused by another server error, affecting Facebook's site, apps, and sites and services that use its login system [32754].
- The outage on 19 June 2014 was the longest outage in four years for Facebook, lasting 31 minutes. The issue prevented people from posting to Facebook for a brief period of time, and the cause was not elaborated upon in the statement provided by Facebook [32754].
- The outage on 21 October 2013 was a "read-only error" caused by Facebook's engineers during network maintenance, preventing users from posting status updates for more than four hours [32754].
- In 2010, Facebook suffered a two-hour disruption due to a fiendishly complex networking problem caused by its engineers. The solution was to turn the site off and then on again [32754].
(b) The software failure incident occurring due to human actions:
- In 2010, Facebook engineers were to blame for a disruption caused by a complex networking problem [32754].
- On 31 July 2007, Facebook was purposefully taken offline by its engineers to fix a bug identified earlier that day [32754]. |
Dimension (Hardware/Software) |
hardware, software |
(a) The software failure incident occurring due to hardware:
- In the incident on 24 September 2010, Facebook suffered a disruption due to a fiendishly complex networking problem caused by its engineers, specifically a runaway condition at a "database cluster" of computer servers [Article 32754].
- The issue required Facebook to stop all traffic to the affected database cluster, essentially turning off the site to address the hardware-related problem.
(b) The software failure incident occurring due to software:
- The outage on 27 January 2015 was caused by Facebook attempting to change something within its systems, which went wrong. This was not a cyber attack but a change that affected the configuration systems, indicating a software-related issue [Article 32754].
- Similarly, on 1 August 2014, another outage was caused by a server error, affecting Facebook's site, apps, and services that use its login system, pointing to a software-related problem [Article 32754]. |
Objective (Malicious/Non-malicious) |
non-malicious |
(a) The software failure incident mentioned in the articles was non-malicious. For example, the outage on 27 January 2015 was caused by Facebook attempting to change something within its systems which went wrong, not a cyber attack as widely reported [32754]. Additionally, the outage on 1 August 2014 was caused by another server error, not a malicious attack [32754].
(b) The software failure incidents were not due to malicious intent but rather technical issues or errors introduced during system changes or maintenance. |
Intent (Poor/Accidental Decisions) |
poor_decisions, accidental_decisions |
(a) poor_decisions:
- The software failure incident on 27 January 2015 was caused by Facebook attempting to change something within its systems which went wrong, not a cyber attack as widely reported. This indicates a failure due to a decision to make changes that led to the outage [32754].
- On 21 October 2013, Facebook engineers were to blame for a "read-only error" that prevented users from posting status updates for more than four hours. This issue occurred while performing network maintenance, indicating a decision that led to the problem [32754].
(b) accidental_decisions:
- The software failure incident on 1 August 2014 was caused by another server error, affecting Facebook's site, apps, and services that use its login system. This suggests a failure due to an accidental decision or mistake that led to the server error [32754].
- In the incident on 24 September 2010, Facebook suffered a disruption due to a complex networking problem caused by its engineers. The solution to the issue was simple - turning the site off and then on again, indicating an accidental decision that led to the disruption [32754]. |
Capability (Incompetence/Accidental) |
development_incompetence, accidental |
(a) The software failure incident occurring due to development incompetence:
- In the incident on 21 October 2013, Facebook's engineers were to blame for a "read-only error" that prevented users from posting status updates for more than four hours [32754].
- In the incident on 24 September 2010, Facebook suffered a disruption due to a fiendishly complex networking problem caused by its engineers, which was resolved by turning the site off and then on again [32754].
(b) The software failure incident occurring accidentally:
- The outage on 27 January 2015 was caused by Facebook attempting to change something within its systems which went wrong, not a cyber attack as widely reported. This change affected the configuration systems, leading to the outage [32754].
- The incident on 1 August 2014 was caused by another server error, affecting Facebook's site, apps, and sites and services that use its login system [32754]. |
Duration |
temporary |
(a) The articles provide information about temporary software failure incidents, where the duration of the failure was not permanent. For example, on 27 January 2015, Facebook experienced a 50-minute outage due to a change in its systems that affected its configuration systems [32754]. Similarly, on 1 August 2014, there was another outage caused by a server error, lasting one hour and 40 minutes [32754]. These incidents highlight temporary failures that were resolved within a specific timeframe. |
Behaviour |
crash, omission, other |
(a) crash:
- The article mentions a Facebook outage in 2010 where the solution was to turn off and on the site due to a networking problem [Article 32754].
- There was an outage in 2013 where users were unable to post status updates for more than four hours due to a "read-only error" [Article 32754].
(b) omission:
- In 2014, Facebook experienced an outage caused by a server error that affected the site, apps, and services using its login system [Article 32754].
- In 2015, an outage occurred when Facebook attempted to change something within its systems, affecting Facebook, Instagram, Tinder, AOL messenger, and Hipchat [Article 32754].
(c) timing:
- There is no specific mention of a timing-related failure in the provided article.
(d) value:
- In 2010, Facebook suffered a two-hour disruption due to a complex networking problem caused by its engineers [Article 32754].
(e) byzantine:
- There is no specific mention of a byzantine-related failure in the provided article.
(f) other:
- In 2007, Facebook was taken offline purposefully by its engineers to fix a bug identified earlier that day [Article 32754]. |