Published Date: 2012-06-25
Postmortem Analysis | |
---|---|
Timeline | 1. The software failure incident at NatWest mentioned in Article 12554 occurred in June 2012. [12554] 2. The software failure incident at NatWest mentioned in Article 17499 happened on Wednesday 6 March 2013. [17499] |
System | 1. CA-7 software update [12554] 2. NatWest and RBS banking apps [124755] |
Responsible Organization | 1. The software failure incident at NatWest was caused by a failed software update that led to problems with the batch processing system, specifically the CA-7 software update [12554]. 2. The software failure incident at Royal Bank of Scotland (RBS) was caused by an IT problem that prevented payments from being processed, leading to customers not receiving their payments [36888]. 3. The software failure incident at NatWest was caused by an IT error that crashed the bank's systems, resulting in customers being unable to access their money and use banking services [17499]. 4. The software failure incident at NatWest was due to an issue on their side that caused an outage in their banking apps, leaving users unable to login or check their accounts [124755]. |
Impacted Organization | 1. NatWest [12554, 17499] 2. RBS [12554, 17499] 3. Ulster Bank [12554, 17499] 4. TSB [90486] |
Software Causes | 1. The software failure incident at NatWest was caused by a failed software update that spiraled out of control for days, leading to issues with the batch processing system controlled by CA-7 software [12554]. 2. The incident at Royal Bank of Scotland (RBS) in 2015 was due to the bank being unable to process a file of payments, causing 600,000 payments to go missing from customer accounts [36888]. 3. Another IT problem at NatWest in 2013 resulted in millions of customers being unable to access their money due to an apparent IT error that caused the bank's systems to crash [17499]. 4. The outage at NatWest and RBS in 2019 was due to an issue on their side that caused the banking apps to crash, leaving users unable to login or check their accounts [124755]. |
Non-software Causes | 1. Lack of experienced staff due to redundancies since 2010 [12554] 2. Offshoring of IT jobs to locations in India [12554] 3. Time and cost pressures leading to corners being cut during IT system upgrades [90486] |
Impacts | 1. Millions of transactions were not processed for three nights, causing delays in updating account balances and potentially leading to overdraft issues for customers [12554]. 2. Customers were unable to withdraw cash, pay for goods and services, or carry out telephone and online banking, leading to frustration and inconvenience [17499]. 3. Customers were left cashless and cut off due to IT failures in the financial services sector, impacting their ability to access funds and make transactions [90486]. |
Preventions | 1. Proper testing of the software update before implementation could have prevented the software failure incident at NatWest and RBS [12554]. 2. Having adequate recovery plans in place in case of an IT failure could have helped mitigate the impact of the incident [90486]. 3. Linking pay for senior staff to the attention given to preventing IT failures could incentivize a proactive approach to system maintenance and updates [90486]. 4. Regulators intervening to improve the operational resilience of the financial sector and ensuring that firms have the appropriate skills and experience in their IT departments could have prevented such incidents [90486]. |
Fixes | 1. Proper testing of new systems before implementation to ensure they are robust and error-free [Article 90486]. 2. Having adequate recovery plans in place in case of IT failures to minimize the impact on customers [Article 90486]. 3. Regulators intervening to improve the operational resilience of the financial sector and ensuring firms hire staff with appropriate skills and experience [Article 90486]. 4. Linking pay for senior staff to the prevention of IT failures to incentivize attention to system reliability [Article 90486]. 5. Taking a proactive approach to ensure customer protection and prevent IT failures caused by time and cost pressures [Article 90486]. | References | 1. Interviews with programmers and experts who have worked on or with NatWest systems [12554] 2. Statements from NatWest spokespersons [12554, 17499] 3. Down Detector reports [124755] 4. Tweets from frustrated customers [17499] 5. Reports from the Financial Conduct Authority (FCA) [90486] 6. Testimonies from MPs [90486] |
Category | Option | Rationale |
---|---|---|
Recurring | one_organization, multiple_organization | (a) The software failure incident having happened again at NatWest: - Article 12554 reports on a software failure incident at NatWest in 2012 where a failed software update caused a crisis, leading to problems with the batch processing system and transactions not being processed correctly for several days. - Article 17499 mentions another IT problem at NatWest in 2013 where customers were unable to access their accounts due to an IT error, causing disruptions in online banking services. - Article 124755 discusses a more recent outage in 2022 where NatWest's banking app experienced issues, leaving users unable to log in or check their accounts. (b) The software failure incident having happened again at multiple organizations: - Article 90486 highlights various IT failures in the financial services sector, including incidents at TSB, Visa, Barclays, and Royal Bank of Scotland, which have left customers cashless and cut off. These failures have led to disruptions in online banking services and have caused harm to consumers. |
Phase (Design/Operation) | design, operation | (a) The software failure incident occurring due to the development phases: - Article 12554 mentions a software failure incident at NatWest caused by a failed software update that spiraled out of control for days after updating a key piece of software (CA-7) ahead of the regular nightly run. The update to CA-7 managed to delete or corrupt the files holding the schedule for overnight jobs, leading to incorrect or missed transactions. This incident highlights a failure introduced during the development phase of the software update [12554]. (b) The software failure incident occurring due to the development phases: - Article 90486 discusses various IT failures in the financial services sector, including incidents at TSB, Visa, Barclays, and RBS, which led to customers being left cashless and cut off. These failures were attributed to issues during the operation or misuse of the systems, such as the backup data center at Visa not switching on during problems, technical glitches at Barclays, and a firewall software upgrade at RBS backfiring. These incidents point to failures introduced during the operation phase of the systems [90486]. |
Boundary (Internal/External) | within_system | (a) The software failure incident reported in the articles is primarily within_system. 1. In Article 12554, the failure at NatWest was caused by a failed software update that led to problems with the batch processing system, specifically the CA-7 software used for automating large sequences of batch mainframe work [12554]. 2. Article 17499 mentions another IT problem at NatWest caused by an apparent IT error that crashed the bank's systems, leading to disruptions in services like online banking and transactions processing [17499]. 3. The incident at RBS in Article 36888 also highlights IT failures in the bank's computer systems, causing issues with processing payments and leaving customers without access to their money [36888]. 4. The Treasury committee report in Article 90486 condemns the number of IT failures in the financial services sector, including incidents at TSB, Visa, Barclays, and RBS, which have left customers cashless and cut off due to disruptions in online banking services [90486]. These incidents point to failures originating from within the system, such as software updates, IT errors, and system crashes, rather than external factors. |
Nature (Human/Non-human) | non-human_actions, human_actions | (a) The software failure incident occurring due to non-human actions: - In Article 12554, the software failure incident at NatWest was attributed to a failed software update that caused problems with the batch processing system, specifically the CA-7 software. The update led to the deletion or corruption of files holding the schedule for overnight jobs, resulting in millions of transactions not being processed correctly until the system was fixed [12554]. - Article 90486 mentions various IT failures in the financial services sector, including incidents at TSB, Visa, Barclays, and RBS, which led to disruptions in online banking services. For example, at Visa, 5.2 million transactions failed due to a backup data center not switching on during problems, and at Barclays, customers faced issues accessing their accounts online due to a technical glitch [90486]. (b) The software failure incident occurring due to human actions: - Article 12554 discusses the potential role of human actions in the NatWest software failure incident. It mentions the offshoring of IT jobs to locations in India and the reduction of experienced staff since 2010 as factors that could have contributed to the problem. The lack of in-depth knowledge and experience in the system due to staff cuts was highlighted as a possible reason for the operational failure [12554]. - The same article also mentions the decision-making process that led to the problematic software update, indicating that a decision made by someone regarding the update to the CA-7 software resulted in the deletion or corruption of critical files, causing the system to malfunction [12554]. - Article 90486 points out that time and cost pressures during IT system upgrades in financial firms may lead to corners being cut, potentially resulting in failures. It emphasizes the need for proper testing of new systems before full implementation and the importance of having adequate recovery plans in place in case of IT failures caused by human actions [90486]. |
Dimension (Hardware/Software) | software | (a) The articles do not provide specific information about a software failure incident occurring due to hardware issues. (b) The software failure incidents reported in the articles are related to software issues. - Article 12554 discusses a software failure incident at NatWest caused by a failed software update that led to problems with the batch processing system controlling retail banking transactions. - Article 17499 reports on another IT problem at NatWest where customers were unable to access their accounts due to an IT error causing the bank's systems to crash. - Article 90486 highlights various IT failures in the financial services sector, including incidents at TSB, Visa, Barclays, and Royal Bank of Scotland, leading to disruptions in online banking services due to software failures. [Cited Articles: 12554, 17499, 90486] |
Objective (Malicious/Non-malicious) | non-malicious | (a) The software failure incident mentioned in the articles appears to be non-malicious in nature. The incidents were attributed to technical issues, system errors, IT failures, and problems with software updates. For example, in Article 12554, the failure at NatWest was caused by a failed software update that led to problems with batch processing systems, and in Article 90486, various financial institutions experienced IT failures due to technical glitches, system errors, and issues with data centers and firewall software upgrades. Additionally, the articles highlight the impact on customers, such as being unable to access their accounts, make transactions, or use online banking services, which aligns more with non-malicious software failures rather than intentional harm to the system. Therefore, based on the information provided in the articles, the software failure incidents discussed were non-malicious in nature. [Citations: Article 12554, Article 90486] |
Intent (Poor/Accidental Decisions) | poor_decisions | (a) poor_decisions: The software failure incident at NatWest was attributed to poor decisions made during a software update. The incident was caused by an update to a key piece of software, CA-7, which controls batch processing systems for retail banking transactions. The update led to the deletion or corruption of files holding the schedule for overnight jobs, resulting in millions of transactions not being processed correctly. The decision to update CA-7 in a way that caused these issues was considered a poor decision that ultimately led to the failure [12554]. (b) accidental_decisions: The software failure incidents at NatWest and RBS were not solely attributed to accidental decisions or unintended mistakes. The failures were primarily linked to poor decisions made during software updates and system changes, as highlighted in the articles. The incidents were a result of specific actions taken during system upgrades and changes rather than random or accidental decisions [12554, 17499, 90486]. |
Capability (Incompetence/Accidental) | development_incompetence | (a) The software failure incident occurring due to development incompetence: - Article 12554 discusses a software failure incident at NatWest caused by a failed software update that led to a crisis in the bank's systems. The incident was attributed to a key piece of software, CA-7, being updated incorrectly, resulting in the batch processing systems not running correctly for three nights. This issue was linked to a lack of experienced staff and potential incompetence in decision-making related to the software update [12554]. (b) The software failure incident occurring accidentally: - Article 90486 highlights various IT failures in the financial services sector, including incidents at TSB, Visa, Barclays, and RBS, which caused harm to consumers. These failures were described as unacceptable and resulted in customers being left cashless and cut off. The report emphasized the need for firms to have adequate recovery plans in place in case of IT failures and to inform customers clearly and timely about incidents. The failures were attributed to time and cost pressures leading to corners being cut during IT system upgrades, indicating accidental introduction of contributing factors [90486]. |
Duration | temporary | (a) The software failure incident mentioned in the articles was temporary. It was not a permanent failure but rather a temporary disruption caused by specific circumstances such as IT problems, system crashes, and failed software updates [12554, 17499, 90486]. |
Behaviour | crash, omission, timing, value | (a) crash: The software failure incident described in Article 12554 resulted in a crash where the batch processing system, which reconciles the movement of money in and out of more than 10m NatWest and Ulster Bank accounts, did not run correctly for three nights, leading to millions of transactions not being processed until it began running correctly on Friday [12554]. (b) omission: The incident in Article 17499 involved an IT error causing NatWest's systems to crash, leaving customers unable to withdraw cash, pay for goods and services, or carry out telephone and online banking, indicating an omission in performing the intended functions [17499]. (c) timing: In Article 12554, the software failure incident was related to a timing issue where a key piece of software was updated ahead of the regular nightly run, causing the batch processing systems to not run correctly for three nights until it began running correctly on Friday. (d) value: The software failure incident in Article 12554 resulted in a value-related failure as the batch processing system did not update the master copy of the account with the definitive balance correctly, leading to millions of transactions not being processed until the issue was fixed [12554]. (e) byzantine: There is no specific mention of a byzantine behavior in the provided articles. (f) other: The software failure incidents described in the articles did not exhibit any other specific behavior beyond crash, omission, timing, and value-related issues. |
Layer | Option | Rationale |
---|---|---|
Perception | None | None |
Communication | None | None |
Application | None | None |
Category | Option | Rationale |
---|---|---|
Consequence | property | (d) property: People's material goods, money, or data was impacted due to the software failure In the software failure incident involving NatWest and RBS, customers were impacted as they were unable to access their money, make payments, or carry out banking transactions due to the IT problems. For example, in one incident, millions of customers were left without access to their money for more than a week, leading to frustration and financial difficulties [Article 17499]. Additionally, in another incident, customers were unable to log in or check their accounts through the banking apps, causing inconvenience and financial uncertainty [Article 124755]. These instances demonstrate how people's material goods, money, and financial transactions were affected by the software failures. |
Domain | finance | (a) The failed system was related to the finance industry, specifically banking systems used by NatWest and RBS to process payments and update customers' accounts [12554, 17499, 90486]. |
Article ID: 12554
Article ID: 36888
Article ID: 124755
Article ID: 17499
Article ID: 90486