Incident: Stock Exchange Trading Halted Due to Hardware Failure.

Published Date: 2018-05-01

Postmortem Analysis
Timeline 1. The software failure incident at TMX Group, which operates the Toronto Stock Exchange, occurred on April 27, 2018 [70907].
System The system that failed in the software failure incident reported in Article 70907 was: 1. Hardware components of the Toronto Stock Exchange operator TMX Group Ltd, leading to a market outage [70907].
Responsible Organization 1. TMX Group Ltd was responsible for causing the software failure incident at the Toronto Stock Exchange by battling with a hardware failure [70907].
Impacted Organization 1. Traders in Canada's biggest stock exchanges were impacted by the software failure incident as they were left in the dark for about an hour and 11 minutes, unable to access critical data [Article 70907]. 2. Retail investors were affected by the incident as well, as their ability to execute trades was hindered due to the reliance on TMX for market data [Article 70907].
Software Causes 1. Hardware failure [70907]
Non-software Causes 1. Hardware failure [70907]
Impacts 1. Traders were left in the dark for about an hour and 11 minutes, with some having no access to critical data, impacting their ability to execute trades [Article 70907]. 2. Retail investors heavily rely on TMX for market data, which was affected by the outage, further hindering their trading activities [Article 70907]. 3. The trading glitch caused Canada's worst stock exchange outage in a decade, highlighting the contrast between Canadian markets and other developed markets [Article 70907]. 4. Total trading volume in Canadian stock exchanges fell by about 60 percent during the outage, but trading shifted to rival platforms once TMX announced the shutdown, leading to a sharp increase in trading volumes in the dying minutes [Article 70907].
Preventions 1. Implementing robust hardware redundancy and failover systems to mitigate the impact of hardware failures [70907]. 2. Conducting regular stress testing and simulations to identify and address potential weaknesses in the trading platform's infrastructure [70907]. 3. Enhancing communication protocols to ensure swift and transparent communication with market participants in the event of a trading disruption [70907]. 4. Establishing clear procedures for declaring a timeout and cancelling all orders in case of a significant system issue to provide certainty to all market participants [70907].
Fixes 1. Implementing more robust hardware systems to prevent hardware failures like the one experienced by TMX Group [70907]. 2. Enhancing communication protocols to ensure swift and transparent communication with market participants in case of any trading issues [70907]. 3. Considering a policy where any outage immediately triggers the closure of exchanges for the day to provide certainty to investors and level the playing field [70907]. 4. Learning from other markets like the U.S. where investors quickly switch to alternative platforms in case of trading disruptions, ensuring a seamless transition in such scenarios [70907]. 5. Declaring a timeout and cancelling all orders in case of an issue, providing certainty to all market participants and allowing for a fair and orderly resumption of trading [70907].
References 1. Market experts, including Chris Sparrow [70907] 2. Ontario Securities Commission (OSC) spokeswoman Carolyn Shaw-Rimmington [70907] 3. TMX Chief Executive Lou Eccleston [70907] 4. TMX spokesman Shane Quinn [70907] 5. Lorne Steinberg, president of Lorne Steinberg Wealth Management Inc [70907] 6. Data from NEO Exchange [70907] 7. Jos Schmitt, CEO of the Aequitas NEO Exchange [70907]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident at TMX Group, which operates the Toronto Stock Exchange, was described as an "unprecedented event" by TMX Chief Executive Lou Eccleston. While the incident was attributed to a hardware failure, the CEO mentioned that it was a challenging situation that required immediate action to diagnose the problem and ensure data security. Despite the outage being a rare occurrence, there were discussions about the need for better communication and handling of such incidents in the future to maintain market credibility [70907]. (b) The article mentioned a previous incident at the New York Stock Exchange in July 2015, where a flawed software rollout led to a trading outage of nearly four hours. This incident highlighted the importance of having multiple trading venues to ensure uninterrupted trading in case of failures. Additionally, the article discussed how in the event of trading disruptions, investors in other developed markets tend to quickly switch to alternative platforms, unlike the situation observed during the TMX Group outage in Canada [70907].
Phase (Design/Operation) design, operation (a) The software failure incident at TMX Group, which led to the shutdown of all markets for an hour and 11 minutes, was attributed to a hardware failure [70907]. This hardware failure impacted trading on all exchange platforms operated by TMX Group. The incident highlighted the importance of swift communication and transparency in such situations, with market experts questioning the timeliness of TMX's communication regarding the shutdown. TMX's CEO mentioned the need to diagnose the problem and ensure data security before resuming trading, indicating the importance of system development and updates in handling such failures. (b) The operation aspect of the software failure incident can be seen in the impact on traders and investors due to the outage. Traders were left in the dark without access to critical data during the outage, affecting their ability to execute trades. The reliance of retail investors on TMX for market data was evident, and the lack of seamless transition to other platforms in Canada highlighted operational challenges during trading disruptions. The incident also raised concerns about TMX's credibility and the potential shift of market participants to rival trading venues, emphasizing the operational implications of such software failures.
Boundary (Internal/External) within_system, outside_system (a) within_system: The software failure incident at TMX Group, which operates the Toronto Stock Exchange, was attributed to a hardware failure within the system. The outage was caused by a trading glitch, described as Canada's worst stock exchange outage in a decade, that led to the shutdown of all exchange platforms [70907]. The CEO of TMX Group mentioned that they needed to diagnose the problem internally and ensure data security before resuming trading, indicating an internal system issue [70907]. (b) outside_system: The incident also highlighted the heavy reliance of retail investors on TMX for market data, impacting their ability to execute trades. This suggests that factors external to the system, such as the dependence of market participants on TMX's services, played a role in the impact of the software failure incident [70907].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident at TMX Group, which led to the shutdown of all markets for an hour and 11 minutes, was attributed to a hardware failure. TMX later confirmed that the market outage was caused by a hardware failure, indicating a non-human action as the contributing factor to the failure [70907]. (b) Following the software failure incident, market experts raised concerns about the communication and response time of TMX Group in handling the situation. There were questions about whether TMX communicated swiftly enough about the shutdown, indicating potential human actions or decisions affecting the incident response [70907].
Dimension (Hardware/Software) hardware (a) The software failure incident at TMX Group, which operates the Toronto Stock Exchange, was attributed to a hardware failure. The outage that lasted for about an hour and 11 minutes was caused by a hardware failure, as confirmed by TMX Group [70907]. (b) The software failure incident did not have contributing factors originating in software.
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident at TMX Group, which led to the shutdown of all markets for an hour and 11 minutes, was attributed to a hardware failure [70907]. There is no indication in the article that the failure was malicious or caused by human intent to harm the system. (b) The incident at TMX Group was described as a trading glitch caused by a hardware failure, resulting in the worst stock exchange outage in Canada in a decade [70907]. The CEO of TMX Group mentioned that the decision to resume trading was made after diagnosing the problem and ensuring data security, indicating a non-malicious software failure incident.
Intent (Poor/Accidental Decisions) unknown The software failure incident at TMX Group, which led to the shutdown of all markets for an hour and 11 minutes, was primarily attributed to a hardware failure [70907]. The incident was not explicitly linked to poor decisions or accidental decisions in the articles provided.
Capability (Incompetence/Accidental) accidental (a) The software failure incident at TMX Group, which led to the shutdown of all markets for an hour and 11 minutes, was attributed to a hardware failure. The incident raised questions about the communication and response time of TMX in handling the outage, with market experts suggesting that a quicker response would have been better [70907]. (b) The incident at TMX Group was described as a trading glitch, causing Canada's worst stock exchange outage in a decade. The outage highlighted the contrast between Canadian markets and other developed markets in terms of handling such disruptions. It was noted that in the United States and other developed markets, investors tend to quickly switch to other platforms in the event of trading disruptions, but a seamless transition did not occur in Canada during this incident [70907].
Duration temporary (a) The software failure incident at TMX Group, which caused the shutdown of all markets for an hour and 11 minutes, was temporary in nature. The incident was attributed to a hardware failure, and TMX Group rectified the issue, leading to the resumption of trading on the following Monday [70907]. (b) The temporary nature of the software failure incident is evident from the fact that trading resumed after the issue was resolved. Additionally, the incident was described as an "unprecedented event" by TMX Chief Executive Lou Eccleston, indicating that it was not a permanent failure [70907].
Behaviour crash, omission (a) crash: The software failure incident at TMX Group, which operates the Toronto Stock Exchange, resulted in a crash where the trading platforms experienced issues and eventually had to shut down due to a hardware failure [70907]. (b) omission: During the incident, the system omitted to perform its intended functions, leading to traders being left in the dark without access to critical data for about an hour and 11 minutes [70907]. (c) timing: The timing of the software failure incident was crucial as it occurred more than two hours before the usual close of trading, causing disruptions and raising questions about the communication timeline between TMX and market participants [70907]. (d) value: The software failure incident did not involve the system performing its intended functions incorrectly but rather failing to perform them due to a hardware failure, leading to the shutdown of trading on all exchange platforms [70907]. (e) byzantine: There is no indication in the articles that the software failure incident exhibited a byzantine behavior with inconsistent responses and interactions. (f) other: The software failure incident also highlighted the contrast between the Canadian markets and other developed markets in terms of handling trading disruptions, showing that a seamless transition to other platforms did not occur in Canada as it does in the United States and other developed markets [70907].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property, delay The consequence of the software failure incident reported in the articles was primarily related to the impact on property and delay: - Property: The software failure incident at TMX Group led to a trading glitch, causing Canada's worst stock exchange outage in a decade. This resulted in traders being left in the dark without access to critical data, affecting their ability to execute trades and potentially impacting their financial assets [70907]. - Delay: The outage forced TMX to shut down trading earlier than usual, disrupting the normal trading activities for the day. Traders had to wait for the issue to be rectified, and trading only resumed on the following Monday after the software failure incident on Friday [70907].
Domain finance (a) The failed system was related to the finance industry as it affected Canada's biggest stock exchanges, including the Toronto Stock Exchange operated by TMX Group Ltd [70907]. The incident caused trading disruptions and market outage, impacting traders and retail investors heavily reliant on TMX for market data and execution of trades. The outage highlighted the contrast between Canadian markets and other developed markets [70907]. (h) The incident specifically affected the finance industry, with TMX Group Ltd, the operator of the Toronto Stock Exchange, experiencing trading issues on all its exchange platforms due to a hardware failure [70907]. The outage was described as Canada's worst stock exchange outage in a decade, emphasizing the significance of the failure within the finance sector. (m) The failed system was not related to any other industry mentioned in the options provided.

Sources

Back to List