Incident: Azure Cosmos DB Database Flaw Exposes Data of Fortune 500 Companies

Published Date: 2021-08-26

Postmortem Analysis
Timeline 1. The software failure incident happened in August 2021. 2. The incident occurred in August 2021 as per the articles [118075, 117741].
System 1. Microsoft Azure Cosmos DB database service [118075, 117741] 2. Jupyter Notebook visualization tool [118075, 117741]
Responsible Organization 1. Microsoft [118075, 117741] 2. Security researchers at Wiz [118075, 117741]
Impacted Organization 1. Thousands of Microsoft's online cloud customers, including Fortune 500 firms like Coca-Cola, Exxon-Mobil, and Citrix were impacted by the software failure incident [118075]. 2. Thousands of Microsoft's cloud computing customers, including some of the world's largest companies, were impacted by the software failure incident [117741].
Software Causes 1. The software cause of the failure incident was a major flaw in Microsoft's flagship Azure Cosmos DB database service, which allowed hackers to read, change, or delete data saved in the cloud [118075, 117741]. 2. The flaw was specifically found in a visualization tool called Jupyter Notebook, which was enabled by default in Cosmos beginning in February [118075, 117741].
Non-software Causes 1. Lack of proper access control measures in Microsoft's Azure Cosmos DB database service [118075, 117741] 2. Failure to promptly notify all affected customers about the vulnerability [118075, 117741] 3. Potential lack of robust security testing procedures in place to detect such critical vulnerabilities [118075, 117741]
Impacts 1. Thousands of Microsoft's online cloud customers, including Fortune 500 firms like Coca-Cola, Exxon-Mobil, and Citrix, were warned that their data may have been exposed to intruders due to a major flaw in Microsoft's Azure Cosmos DB database service [118075, 117741]. 2. The vulnerability in Azure's flagship Cosmos DB database allowed hackers to potentially read, change, or delete data saved in the cloud, posing a significant security risk to the affected companies [118075, 117741]. 3. The flaw, named ChaosDB, was discovered by a research team at the security company Wiz, who were able to access keys controlling access to databases held by thousands of companies, highlighting the severity of the security breach [118075, 117741]. 4. Microsoft had to email its customers to create new access keys as the company could not change the keys by itself, indicating the urgency and critical nature of the situation [118075, 117741]. 5. The incident raised concerns about the security of cloud services, as Microsoft and outside security experts have been advocating for companies to rely more on the cloud for enhanced security, despite the potential risks associated with cloud vulnerabilities [118075, 117741].
Preventions 1. Regular security audits and penetration testing could have potentially identified the vulnerability in Microsoft's Azure Cosmos DB database service before it was exploited [118075, 117741]. 2. Implementing stricter access controls and monitoring mechanisms for sensitive data stored in the cloud could have helped prevent unauthorized access to customer databases [118075, 117741]. 3. Ensuring timely patching and updates for all software components, including third-party tools like Jupyter Notebook, could have closed off potential entry points for attackers [118075, 117741]. 4. Enhanced communication and collaboration between security researchers, like the team at Wiz, and software vendors could lead to quicker identification and resolution of critical vulnerabilities [118075, 117741].
Fixes 1. Microsoft fixed the vulnerability in its Azure Cosmos DB database service by immediately addressing the issue to keep its customers safe and protected [117741]. 2. Microsoft advised its customers to create new access keys to replace the vulnerable ones, as the company could not change the keys by itself [118075]. 3. Customers who were potentially impacted by the vulnerability were notified by Microsoft to take necessary actions [117741]. 4. Security researchers, such as Wiz, play a crucial role in identifying and reporting software vulnerabilities, leading to their resolution [117741, 118075].
References 1. Microsoft 2. Wiz 3. Reuters

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident having happened again at one_organization: - Microsoft faced a major setback when a major flaw in its flagship Azure Cosmos DB database service was revealed, leaving its customers' information vulnerable to hackers [118075]. - The disclosure of this vulnerability comes after months of bad security news for Microsoft, including being breached by suspected Russian government hackers and other security issues [117741]. (b) The software failure incident having happened again at multiple_organization: - The article does not provide specific information about similar incidents happening at other organizations or with their products and services.
Phase (Design/Operation) design, operation (a) The software failure incident in the articles can be attributed to the design phase. The incident was caused by a major flaw in Microsoft's flagship Azure Cosmos DB database service, which allowed hackers to access keys controlling access to databases held by thousands of companies [118075, 117741]. This flaw was discovered by a research team at the security company Wiz, indicating a design vulnerability in the system. (b) The software failure incident can also be linked to the operation phase. Microsoft had to email its customers to create new access keys because the company could not change the keys by itself, indicating an operational response to mitigate the vulnerability [118075, 117741]. Additionally, the incident highlighted the importance of operational procedures in responding to and resolving security vulnerabilities in cloud services.
Boundary (Internal/External) within_system, outside_system (a) within_system: The software failure incident related to the Microsoft Azure Cosmos DB database service vulnerability was primarily due to a flaw in the visualization tool called Jupyter Notebook, which was enabled by default in Cosmos beginning in February [118075, 117741]. This flaw allowed hackers to access keys that control access to databases held by thousands of companies, leading to potential data exposure and manipulation within the system. (b) outside_system: The software failure incident was also influenced by external factors such as the actions of the security company Wiz, which discovered the vulnerability and reported it to Microsoft [118075, 117741]. Additionally, the incident was part of a broader trend of bad security news for Microsoft, including previous breaches by suspected Russian government hackers and other security flaws in Microsoft products [118075, 117741].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in the articles was primarily due to non-human actions. The incident was caused by a major flaw in Microsoft's flagship Azure Cosmos DB database service, which allowed hackers to access keys controlling access to databases held by thousands of companies [118075, 117741]. This vulnerability was discovered by a research team at the security company Wiz, and Microsoft had to notify customers to create new access keys to mitigate the risk [118075, 117741]. (b) Human actions were also involved in addressing the software failure incident. Microsoft agreed to pay Wiz $40,000 for finding and reporting the flaw [118075, 117741]. Additionally, the Wiz team found the problem and notified Microsoft, leading to the immediate fix of the issue to keep customers safe [117741].
Dimension (Hardware/Software) software (a) The software failure incident reported in the news articles is primarily due to contributing factors that originate in software. The incident involved a major flaw in Microsoft's flagship Azure Cosmos DB database service, which allowed hackers to potentially read, change, or delete data saved in the cloud [118075, 117741]. The vulnerability was discovered by a research team at the security company Wiz, who found that they were able to access keys that control access to databases held by thousands of companies [118075, 117741]. The flaw was in a visualization tool called Jupyter Notebook, which was enabled by default in Cosmos beginning in February [118075, 117741]. (b) There is no specific mention in the articles of the software failure incident being caused by contributing factors originating in hardware.
Objective (Malicious/Non-malicious) malicious (a) The software failure incident in the articles is classified as malicious. The incident involved a major flaw in Microsoft's flagship Azure Cosmos DB database service that could allow hackers to read, change, or delete data saved in the cloud. The flaw was discovered by a research team at the security company Wiz, who were able to access keys that control access to databases held by thousands of companies [118075, 117741]. The vulnerability was described as the worst cloud vulnerability imaginable, and the team was able to access any customer database they wanted [118075]. The flaw, named ChaosDB, was found in a visualization tool called Jupyter Notebook, which was enabled by default in Cosmos beginning in February [118075]. Microsoft had to email customers to create new access keys as they could not change them by themselves [118075]. The incident highlights the serious impact of malicious software vulnerabilities on cloud services and the potential risks posed to customer data and security.
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident related to the Microsoft Azure Cosmos DB database vulnerability can be attributed to poor decisions. The incident was caused by a major flaw in the flagship Azure Cosmos DB database service, which allowed hackers to potentially access, modify, or delete data stored in the cloud. This vulnerability was discovered by a research team at the security company Wiz, who were able to access keys controlling access to databases of thousands of companies [118075, 117741]. Microsoft's decision to enable a visualization tool called Jupyter Notebook by default in Cosmos DB, which ultimately led to the exposure of access keys, can be considered a poor decision contributing to the software failure incident.
Capability (Incompetence/Accidental) development_incompetence (a) The software failure incident related to development incompetence is evident in the articles. The incident involving a major flaw in Microsoft's Azure Cosmos DB database service was discovered by a research team at the security company Wiz. The flaw allowed hackers to access keys controlling access to databases held by thousands of companies, including Fortune 500 firms like Coca-Cola and Exxon-Mobil [118075, 117741]. This flaw was found in a visualization tool called Jupyter Notebook, which had been enabled by default in Cosmos beginning in February, indicating a potential oversight in the development process that led to the vulnerability. (b) The software failure incident related to accidental factors is also apparent in the articles. Microsoft acknowledged the vulnerability in its Azure Cosmos DB database and took immediate action to fix the issue to protect its customers. The company thanked the security researchers for their work under coordinated vulnerability disclosure, indicating that the discovery of the flaw was not intentional but rather accidental [118075, 117741].
Duration temporary (a) The software failure incident in this case was temporary. The vulnerability in Microsoft's Azure Cosmos DB database service was discovered by a research team at the security company Wiz, who then reported it to Microsoft. Microsoft immediately took action to fix the issue and informed customers to create new access keys to protect their data [118075, 117741].
Behaviour value, other (a) crash: The software failure incident in the articles does not involve a crash where the system loses state and does not perform any of its intended functions [118075, 117741]. (b) omission: The incident does not involve a failure due to the system omitting to perform its intended functions at an instance(s) [118075, 117741]. (c) timing: The incident does not involve a failure due to the system performing its intended functions correctly, but too late or too early [118075, 117741]. (d) value: The software failure incident is related to a major flaw in Microsoft's Azure Cosmos DB database service that could allow hackers to read, change, or delete data saved in the cloud, indicating a failure due to the system performing its intended functions incorrectly [118075, 117741]. (e) byzantine: The incident does not involve a failure due to the system behaving erroneously with inconsistent responses and interactions [118075, 117741]. (f) other: The software failure incident involves a vulnerability in Microsoft's Azure Cosmos DB database service that allowed unauthorized access to keys controlling access to databases, potentially compromising the security and integrity of customer data [118075, 117741].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property, theoretical_consequence (d) property: People's material goods, money, or data was impacted due to the software failure The software failure incident involving Microsoft's Azure Cosmos DB database service exposed thousands of its online cloud customers, including Fortune 500 firms, to potential data breaches by allowing hackers to read, change, or delete data saved in the cloud [118075, 117741]. The vulnerability in the Azure Cosmos DB database service could have allowed hackers to access keys controlling access to databases held by thousands of companies, potentially compromising sensitive information [118075, 117741]. Microsoft had to notify its customers to create new access keys to mitigate the risk of unauthorized access to their databases [118075, 117741].
Domain information, finance (a) The failed system was intended to support the information industry as it involved a major flaw in Microsoft's flagship Azure Cosmos DB database service, which is often used to manage prescription transactions or managing flows of customer orders [118075, 117741]. (h) The failed system also impacted the finance industry as it left Fortune 500 firms like Coca-Cola, Exxon-Mobil, and Citrix vulnerable to hackers, potentially exposing their data saved in the cloud [118075, 117741]. (m) The incident is related to the technology industry, specifically cloud computing, as Microsoft's Azure cloud service was affected by the major flaw in its flagship Cosmos DB database service [118075, 117741].

Sources

Back to List