Incident: Bloomberg Terminal Outage Caused by Hardware and Software Failures

Published Date: 2015-04-17

Postmortem Analysis
Timeline 1. The software failure incident happened on April 17, 2015 [35072, 35113, 35192].
System 1. Bloomberg data and communication network [35072, 35113, 35192] 2. Bloomberg terminals [35072, 35113, 35192] 3. Combination of hardware and software failures in the network [35072, 35113, 35192]
Responsible Organization 1. A spilled can of Coke in one of the server rooms at Bloomberg's office in the City of London [Article 35113] 2. A combination of hardware and software failures in the network at Bloomberg [Article 35072, Article 35113, Article 35192]
Impacted Organization 1. Financial professionals around the world, including traders and bankers, were impacted by the software failure incident at Bloomberg [35072, 35113, 35192]. 2. The UK Debt Management Office had to postpone a £3 billion debt sale due to the blackout caused by the software failure [35113, 35192]. 3. Traders of government bonds, shares, and other financial instruments who rely on Bloomberg terminals were affected by the outage [35192]. 4. The Financial Conduct Authority monitored the impact of the software failure on the firms it regulates [35192]. 5. The Bank of England reassured markets that it had not been affected by the software failure but was monitoring the situation [35192]. 6. Traders at asset management firms like London & Capital were left unable to work effectively due to the Bloomberg outage [35113, 35192]. 7. The London Stock Exchange experienced disruptions due to technical glitches in the past [35192]. 8. Nasdaq faced technical glitches during Facebook's stock market debut in 2012 [35192].
Software Causes 1. The software causes of the failure incident at Bloomberg were a combination of hardware and software failures in the network, accompanied by a failure in the company's multiple redundant systems [35072]. 2. Bloomberg blamed a 'combination of hardware and software failures in the network' for the outage, which was first reported at around 8.20 am and lasted into the afternoon [35113]. 3. Bloomberg stated that they experienced a combination of hardware and software failures in the network, which caused an excessive volume of network traffic, leading to customer disconnections as a result of the machines being overwhelmed. They isolated the faulty hardware and restarted the software, while also reviewing their multiple redundant systems that failed to prevent the disruption [35192].
Non-software Causes 1. A spilled can of Coke in one of the server rooms at Bloomberg's office in the City of London [Article 35113] 2. A combination of hardware failures in the network, including faulty hardware [Article 35113] 3. Excessive volume of network traffic due to hardware and software failures [Article 35113] 4. Lack of access to information due to the blackout affecting potential investors [Article 35113] 5. Delay in the sale of Government bonds due to technical issues with the third-party platform [Article 35113] 6. Disruption caused by hardware and software failures in the network [Article 35192]
Impacts 1. The software failure incident involving Bloomberg terminals caused trading disruptions in stock exchanges around the world, leading to trillions of pounds of trading being stopped or diverted, and the postponement of a £3 billion bond sale by the British Treasury [35072, 35113, 35192]. 2. The outage affected more than 300,000 traders on financial markets, prompting the Financial Conduct Authority to monitor the impact on regulated firms [35192]. 3. Traders were left unable to work effectively without access to Bloomberg terminals, causing delays in trading activities and difficulties in arranging deals due to the absence of Bloomberg's messaging system [35072, 35113, 35192]. 4. The breakdown was described as the worst seen in the City of London in over a decade, with traders being left "twiddling their thumbs" and facing challenges in conducting trades [35192]. 5. The disruption led to traders resorting to alternative means of communication such as emails and phone calls, impacting the efficiency and speed of trading activities [35072, 35113, 35192]. 6. The outage highlighted the heavy reliance of financial professionals on Bloomberg terminals for real-time trading information and transactions, emphasizing the critical role these systems play in the financial ecosystem [35072, 35113, 35192].
Preventions 1. Implementing more robust redundancy systems to prevent network overload and customer disconnections during hardware and software failures [35072, 35113, 35192]. 2. Regular maintenance and monitoring of hardware to prevent issues like a spilled can of Coke causing disruptions [35113]. 3. Enhancing cybersecurity measures to protect against potential cyber attacks that could lead to network disruptions [35113]. 4. Diversifying reliance on a single provider like Bloomberg by having alternative data suppliers in place, such as products from competitors like Thomson Reuters [35072, 35192]. 5. Developing backup systems and alternative communication channels for traders to use in case of system failures [35072, 35192].
Fixes 1. Implementing better redundancy systems to prevent hardware and software failures from causing widespread outages [35072, 35113, 35192]. 2. Conducting a thorough review of the network infrastructure to identify and address vulnerabilities that led to the excessive volume of network traffic [35113]. 3. Enhancing cybersecurity measures to protect against potential cyber attacks that could disrupt the system [35113]. 4. Developing alternative communication channels and backup systems to ensure trading activities can continue in case of system failures [35072, 35192]. 5. Improving customer support and response mechanisms to handle disruptions more effectively and provide timely assistance to users [35113, 35192].
References 1. Bloomberg spokesperson [Article 35113] 2. Financial professionals and traders affected by the incident [Article 35072, Article 35113, Article 35192] 3. UK's Debt Management Office [Article 35113, Article 35192] 4. Cyber security expert Greg Sims [Article 35113] 5. BBC journalist Joe Lynam [Article 35113] 6. Financial Conduct Authority [Article 35192] 7. Bank of England [Article 35192] 8. Fortune [Article 35192] 9. Steve Collins, global head of dealing at asset management firm London & Capital [Article 35113, Article 35192] 10. Jasper Lawler of City firm CMC Markets [Article 35192] 11. Louis Gargour, chief investment officer at LNG Capital [Article 35192] 12. Wall Street Journal [Article 35192]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident having happened again at one_organization: - The incident at Bloomberg in 2015 was attributed to a combination of hardware and software failures, affecting their network and leading to customer disconnections [35072]. - Bloomberg experienced a disruption due to hardware and software failures, causing an excessive volume of network traffic and customer disconnections [35113]. - Bloomberg blamed a combination of hardware and software failures for the outage, which lasted several hours and impacted trading activities globally [35192]. (b) The software failure incident having happened again at multiple_organization: - The incident at Bloomberg prompted discussions about finding alternatives to the Bloomberg system, indicating a potential need for diversification and reduced reliance on a single provider [35072]. - The outage at Bloomberg affected more than 300,000 traders globally, highlighting the widespread impact of such incidents on financial markets [35113]. - The Financial Conduct Authority monitored the impact of the Bloomberg terminals' breakdown on the firms it regulates, suggesting a broader concern about reliance on specific systems in the financial industry [35192].
Phase (Design/Operation) design (a) The software failure incident in the Bloomberg network crash was attributed to a combination of hardware and software failures, as well as issues in the company's multiple redundant systems [35072, 35113]. The outage was caused by a spilled can of Coke in one of the server rooms, which reportedly knocked out systems across two continents [35113]. Bloomberg mentioned that the disruption was due to a combination of hardware and software failures in the network, leading to excessive network traffic and customer disconnections [35113]. Traders were left unable to work without the Bloomberg terminals, which provide real-time trading information and are crucial for buying and selling financial instruments [35113]. (b) The software failure incident also impacted the operation of financial markets, with traders being unable to work effectively without access to the Bloomberg terminals [35113]. The outage prompted the postponement of a £3 billion debt sale by the UK Debt Management Office due to potential investors not having access to necessary information [35113]. Traders were left "twiddling their thumbs" and calls to the Bloomberg helpline went unanswered, affecting market activities [35192]. The absence of Bloomberg's messaging system made it difficult to arrange deals, and traders had to resort to more traditional methods like using the telephone for urgent transactions [35192].
Boundary (Internal/External) within_system (a) The software failure incident involving the Bloomberg terminals was primarily within the system. The incident was caused by a combination of hardware and software failures within Bloomberg's network [35072, 35113, 35192]. Bloomberg itself stated that the disruption was due to a mix of hardware and software failures, leading to an excessive volume of network traffic and customer disconnections [35113]. The outage affected more than 300,000 traders on financial markets and prompted the Financial Conduct Authority to monitor the impact on regulated firms [35192]. The root cause was quickly identified as internal network issues, and Bloomberg mentioned that they were reviewing their multiple redundant systems that failed to prevent the disruption [35113]. The incident did not result from a cyber attack, as initially speculated [35113].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident occurring due to non-human actions: - The software failure at Bloomberg was attributed to a "combination of hardware and software failures in the network" [35072]. - Bloomberg blamed a 'combination of hardware and software failures in the network' for the outage [35113]. - Bloomberg blamed a combination of hardware and software failures after City dealers took to Twitter to lament the blackout [35192]. (b) The software failure incident occurring due to human actions: - Reports suggested that a spilled can of Coke in one of the server rooms at Bloomberg's office in the City of London had been responsible for knocking out systems across two continents [35113]. - Some traders initially speculated that the outage could have been caused by hackers [35113]. - Cyber security expert Greg Sims mentioned that the glitch looked suspicious and highlighted the potential threat of cyber attacks [35113].
Dimension (Hardware/Software) hardware, software (a) The software failure incident occurring due to hardware: - Article 35072 mentions that the Bloomberg data and communication network crash was caused by "a combination of hardware and software failures" [35072]. - Article 35113 reports that Bloomberg blamed the outage on a "combination of hardware and software failures in the network" [35113]. - Article 35192 also states that Bloomberg blamed the incident on a "combination of hardware and software failures" [35192]. (b) The software failure incident occurring due to software: - Article 35072 mentions that the problem at Bloomberg was caused by "a combination of hardware and software failures" [35072]. - Article 35113 reports that Bloomberg blamed the outage on a "combination of hardware and software failures in the network" [35113]. - Article 35192 also states that Bloomberg blamed the incident on a "combination of hardware and software failures" [35192].
Objective (Malicious/Non-malicious) non-malicious (a) The articles suggest that the software failure incident was non-malicious in nature. The incident was attributed to a combination of hardware and software failures within the network, leading to excessive network traffic and customer disconnections [35072, 35113, 35192]. Bloomberg stated that the root cause was quickly identified as a hardware and software issue, and they isolated the faulty hardware and restarted the software to resolve the problem. There were no indications of a cyber attack being the cause of the disruption, as confirmed by Bloomberg [35113, 35192]. Additionally, reports from inside the company indicated that the outage was potentially caused by a spilled can of Coke in one of the server rooms, which knocked out systems across two continents [35113]. This accidental incident further supports the non-malicious nature of the software failure. Overall, the software failure incident was primarily attributed to technical issues and hardware failures rather than any malicious intent. (b) There is no information in the articles to suggest that the software failure incident was malicious in nature.
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident was attributed to a combination of hardware and software failures, as well as issues with the network at Bloomberg's offices in the City of London [Article 35072, Article 35113, Article 35192]. This indicates that the failure was not solely due to accidental decisions but rather a result of poor decisions or contributing factors introduced by various failures in the system.
Capability (Incompetence/Accidental) accidental (a) The software failure incident in the Bloomberg network crash was attributed to a combination of hardware and software failures, as well as failure in the company's multiple redundant systems [35072]. The outage was caused by a spilled can of Coke in one of the server rooms, which was reported to have knocked out systems across two continents [35113]. The incident led to an excessive volume of network traffic, causing customer disconnections due to machines being overwhelmed [35192]. (b) The accidental aspect of the software failure incident is highlighted by the reports suggesting that a spilled can of Coke in the server room was responsible for knocking out systems across two continents [35113]. The incident was not attributed to a cyber attack, with Bloomberg stating that it was a combination of hardware and software failures that caused the disruption [35192].
Duration temporary The software failure incident reported in the news articles was temporary. The incident lasted for a few hours, with most Bloomberg terminals back up and running by the time the United States markets opened on Friday morning [35072]. The outage was first reported around 8.20 am and lasted into the afternoon, affecting more than 300,000 traders on financial markets [35113]. It took the whole of the London trading day before Bloomberg announced that the situation had been fully resolved, although many terminals were running again around lunchtime [35192]. The outage prompted the postponement of a £3 billion debt sale by the UK Debt Management Office, which was later conducted in the afternoon once the issue was fixed [35113].
Behaviour crash, omission (a) crash: The software failure incident in the articles can be categorized as a crash. The Bloomberg terminals experienced a catastrophic crash that halted trading in stock exchanges around the world, leaving more than 300,000 screens offline for several hours [Article 35113]. Traders were unable to work, and trillions of pounds of trading had to be stopped or diverted due to the crash [Article 35113]. (b) omission: The software failure incident can also be categorized as an omission. The breakdown of the Bloomberg terminals led to traders being unable to work without access to the terminals, which provide real-time trading information and are crucial for buying and selling shares, bonds, and other financial instruments [Article 35113]. The outage caused delays in the sale of government bonds and disrupted communication between banks and investors [Article 35072]. (c) timing: The software failure incident does not align with a timing failure. The issue was not related to the system performing its intended functions too late or too early. (d) value: The software failure incident does not align with a value failure. The issue was not related to the system performing its intended functions incorrectly. (e) byzantine: The software failure incident does not align with a byzantine failure. The system did not exhibit inconsistent responses or interactions. (f) other: The software failure incident can be categorized as a crash and omission, as described above.

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property, delay, non-human, theoretical_consequence (a) death: People lost their lives due to the software failure - No information about any deaths caused by the software failure incident was mentioned in the articles [35072, 35113, 35192]. (b) harm: People were physically harmed due to the software failure - No information about physical harm to individuals due to the software failure incident was provided in the articles [35072, 35113, 35192]. (c) basic: People's access to food or shelter was impacted because of the software failure - No information about people's access to food or shelter being impacted due to the software failure incident was discussed in the articles [35072, 35113, 35192]. (d) property: People's material goods, money, or data was impacted due to the software failure - The software failure incident impacted trading activities, causing disruptions in the financial markets and delaying bond sales, such as the £3 billion debt sale by the British Treasury [35072, 35113, 35192]. (e) delay: People had to postpone an activity due to the software failure - The software failure incident led to the postponement of bond sales, including the £3 billion debt sale by the British Treasury, due to technical issues with the Bloomberg platform [35072, 35113, 35192]. (f) non-human: Non-human entities were impacted due to the software failure - The software failure incident affected the Bloomberg data and communication network, causing disruptions in trading activities and financial markets [35072, 35113, 35192]. (g) no_consequence: There were no real observed consequences of the software failure - The software failure incident had significant consequences, including disruptions in trading activities, delays in bond sales, and challenges for traders who heavily relied on Bloomberg terminals [35072, 35113, 35192]. (h) theoretical_consequence: There were potential consequences discussed of the software failure that did not occur - The potential theoretical consequence discussed was the fear that regulators might consider Bloomberg's dominance a systemic risk, requiring regulatory intervention [35072]. (i) other: Was there consequence(s) of the software failure not described in the (a to h) options? What is the other consequence(s)? - No other specific consequences of the software failure incident were mentioned in the articles [35072, 35113, 35192].
Domain information, finance (a) The failed system was intended to support the finance industry. The Bloomberg data and communication network, which experienced a crash, provides data and trading services for 325,000 financial professionals around the world [35072]. The outage affected trading in stock exchanges globally and halted trading in various financial markets [35113]. The Bloomberg terminals are crucial for traders of government bonds, shares, and other financial instruments [35192]. (h) The software failure incident was related to the finance industry, specifically impacting trading activities, bond sales, and financial market operations [35072, 35113, 35192].

Sources

Back to List