Incident: IBM Watson for Oncology Makes Inaccurate and Unsafe Treatment Recommendations

Published Date: 2018-07-27

Postmortem Analysis
Timeline 1. The software failure incident involving IBM's Watson for Oncology system happened in June and July 2017 as per internal documents reviewed by Stat News [73338].
System 1. IBM's artificial intelligence software, Watson [73338]
Responsible Organization 1. IBM's artificial intelligence software, Watson, was responsible for causing the software failure incident [73338].
Impacted Organization 1. Medical experts working with IBM on its Watson for Oncology system [73338] 2. Patients who could potentially receive inaccurate or unsafe treatment recommendations from Watson [73338] 3. Hospitals and healthcare institutions using Watson for Oncology [73338]
Software Causes 1. The failure incident with IBM's Watson for Oncology system was caused by inaccuracies and unsafe treatment recommendations made by the artificial intelligence software [73338].
Non-software Causes 1. Lack of accurate training data: The AI system Watson was fed hypothetical data instead of real patient data, leading to inaccurate treatment recommendations [73338]. 2. Insufficient process for building content and underlying technology: Serious questions were raised about the process for building content and the underlying technology of Watson, indicating potential flaws in the development and testing procedures [73338]. 3. Misuse of the system: Some doctors used Watson for marketing purposes rather than relying on it for accurate medical recommendations, highlighting a misuse of the technology [73338].
Impacts 1. The software failure incident involving IBM's Watson for Oncology system led to 'often inaccurate' and 'unsafe' treatment recommendations, raising serious concerns about patient safety and the reliability of the AI system [73338]. 2. Despite the inaccuracies and safety issues, no patients were reportedly harmed as a direct result of Watson's missteps [73338]. 3. The incident caused distrust among medical professionals, with one doctor at Jupiter Hospital in Florida describing the product as a 'piece of s***' and expressing disappointment in its performance [73338]. 4. The failure impacted the training process of Watson, as it was fed hypothetical data instead of real patient data, leading to questions about the effectiveness of the system's recommendations [73338]. 5. The incident highlighted the need for continuous improvement and learning for AI systems like Watson, with IBM emphasizing ongoing updates and enhancements to address feedback and improve functionality [73338].
Preventions 1. Proper Testing and Validation Procedures: Implementing thorough testing and validation processes could have helped identify inaccuracies and unsafe recommendations in IBM's Watson for Oncology system before it was deployed [73338]. 2. Real Data Training: Training the AI system with real patient data instead of hypothetical data could have potentially improved the accuracy of Watson's recommendations and prevented misleading suggestions [73338]. 3. Enhanced Training Methods: Utilizing more advanced training methods to ensure that Watson synthesizes vast amounts of data and reaches its own conclusions rather than relying solely on recommendations from other doctors could have improved the system's performance [73338]. 4. Continuous Improvement and Feedback Mechanisms: Establishing robust mechanisms for continuous improvement based on feedback from users, scientific evidence, and new data could have helped address issues and enhance the functionality of Watson over time [73338].
Fixes 1. Improve the process for building content and the underlying technology of Watson to address inaccuracies in treatment recommendations [73338]. 2. Ensure that Watson is trained on real patient data rather than hypothetical data to provide more accurate and reliable suggestions [73338]. 3. Implement additional software releases and updates to enhance functionality and accuracy, incorporating feedback from clients and new scientific evidence [73338].
References 1. Internal documents reviewed by Stat News [73338] 2. Presentations given by IBM Watson's former deputy health chief Andrew Norden [73338] 3. Spokesperson for Memorial Sloan Kettering [73338] 4. Doctors at Jupiter Hospital in Florida [73338] 5. Spokesperson for IBM [73338]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident related to IBM's Watson for Oncology making inaccurate and unsafe treatment recommendations has happened within the same organization. The incident was reported by medical experts working with IBM on its Watson for Oncology system, where it was discovered that the system made often inaccurate and unsafe treatment recommendations [73338]. (b) The software failure incident related to inaccurate and unsafe treatment recommendations by IBM's Watson for Oncology has not been explicitly mentioned to have occurred at other organizations in the provided articles. Therefore, there is no information available about similar incidents happening at multiple organizations.
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase can be seen in the case of IBM's Watson for Oncology system. The internal documents reviewed by Stat News revealed that Watson made 'often inaccurate' and 'unsafe' treatment recommendations, raising serious questions about the process for building content and the underlying technology [73338]. The AI system was supposed to analyze data from real patients but was instead fed hypothetical data, leading to inaccuracies in its recommendations. This highlights a failure in the design phase where the system was not properly trained or tested with real-world data, resulting in flawed outputs. (b) The software failure incident related to the operation phase is evident in how doctors at Jupiter Hospital in Florida expressed concerns about Watson's suggestions and its practical usability. One doctor at the hospital even referred to the product as a 'piece of s***' and mentioned that they cannot use it for most cases [73338]. This indicates that the failure in the operation phase was due to the system not meeting the expectations of medical professionals in real-world scenarios, leading to doubts about its effectiveness and practical application in clinical settings.
Boundary (Internal/External) within_system (a) within_system: The software failure incident related to IBM's Watson for Oncology system was primarily due to factors originating from within the system. The internal documents reviewed by Stat News revealed that Watson made 'often inaccurate' and 'unsafe' treatment recommendations, leading to concerns about the process for building content and the underlying technology [73338]. Additionally, the AI system was supposed to analyze data from real patients but was instead fed hypothetical data, resulting in doctors receiving recommendations from other doctors rather than from the AI synthesizing vast amounts of data and reaching its own conclusions [73338]. (b) outside_system: There is no specific information in the articles indicating that the software failure incident was primarily due to contributing factors originating from outside the system.
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident related to non-human actions in the case of IBM's Watson for Oncology system was primarily due to inaccuracies and unsafe treatment recommendations made by the artificial intelligence software itself. The internal documents reviewed by medical experts working with IBM revealed that Watson provided often inaccurate suggestions, such as recommending a drug that could result in severe or fatal hemorrhage for a patient with lung cancer and severe bleeding [73338]. (b) On the other hand, the software failure incident related to human actions was highlighted in the way Watson was trained for the project. Instead of analyzing data from real patients, the AI system was fed hypothetical data, leading to doctors receiving recommendations from other doctors rather than from the AI synthesizing vast amounts of data and reaching its own conclusions. This approach raised concerns about the process for building content and the underlying technology of Watson [73338].
Dimension (Hardware/Software) software (a) The software failure incident related to hardware: - The article does not mention any specific hardware-related issues contributing to the software failure incident reported in relation to IBM's Watson for Oncology system [73338]. (b) The software failure incident related to software: - The software failure incident with IBM's Watson for Oncology system was primarily due to inaccuracies and unsafe treatment recommendations made by the artificial intelligence software itself [73338]. The system was found to provide recommendations that could potentially harm patients, indicating a failure originating in the software's algorithms and decision-making processes.
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident related to the IBM Watson for Oncology system was non-malicious. The inaccuracies and unsafe treatment recommendations made by Watson were not intentional acts to harm the system or patients. The incident was attributed to issues in the process for building content and the underlying technology, as well as the training data fed to the AI system being hypothetical rather than real patient data [73338].
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident related to IBM's Watson for Oncology system was primarily due to poor decisions made during the development and training of the AI system. Medical experts working with the system discovered that it made 'often inaccurate' and 'unsafe' treatment recommendations, leading to serious concerns about the process for building content and the underlying technology [73338]. The incident highlighted issues with how Watson was trained for the project, as it was fed hypothetical data instead of analyzing real patient data. This led to doctors receiving recommendations from other doctors rather than from an AI synthesizing vast amounts of data and reaching its own conclusions. The decision to train the system on hypothetical data contributed to the inaccuracies and safety concerns raised by medical professionals [73338].
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident related to development incompetence is evident in the case of IBM's Watson for Oncology system. Medical experts working with the system discovered that it made 'often inaccurate' and 'unsafe' treatment recommendations, as revealed in internal documents [73338]. The incident highlighted serious questions about the process for building content and the underlying technology, with one doctor even expressing frustration by calling the product a "piece of s***" [73338]. Additionally, concerns were raised about the way Watson was trained for the project, as it was fed hypothetical data instead of analyzing real patient data, leading to recommendations from other doctors rather than AI-driven conclusions [73338]. (b) The software failure incident related to accidental factors is seen in the case where Watson recommended a drug that could result in 'severe or fatal hemorrhage' for a patient with lung cancer and internal bleeding. The suggestion was noted to be hypothetical and part of system testing, indicating a potential accidental introduction of unsafe recommendations [73338]. Furthermore, the spokesperson for IBM mentioned that the internal documents did not provide a timely representation of Watson and emphasized continuous learning and improvement of the system based on feedback, scientific evidence, and new cancer treatment alternatives [73338].
Duration temporary The software failure incident related to IBM's Watson for Oncology system can be categorized as a temporary failure. The incident involved inaccurate and unsafe treatment recommendations being made by the AI system, as reported in internal documents reviewed by medical experts [73338]. The inaccuracies raised serious questions about the process for building content and the underlying technology of Watson [73338]. Despite these issues, no patients were reportedly harmed due to Watson's missteps [73338]. IBM mentioned that Watson is still learning and has been continuously improved based on feedback from clients and new scientific evidence [73338]. Additionally, IBM released a report stating that Watson could accurately identify tumors in up to 93% of cases following tests on real patients [73338]. These aspects indicate that the software failure incident was not permanent but rather a temporary setback that prompted improvements and ongoing learning for the system.
Behaviour omission, value, other (a) crash: The software failure incident related to IBM's Watson for Oncology system did not result in any reported harm to patients despite making inaccurate and unsafe treatment recommendations, indicating that the system did not completely crash in the sense of causing catastrophic harm [73338]. (b) omission: The incident involved the system omitting to perform its intended functions correctly by providing inaccurate and unsafe treatment recommendations, as highlighted by medical experts who found the recommendations to be often inaccurate and unsafe [73338]. (c) timing: There is no specific mention of the software failure incident being related to timing issues where the system performed its intended functions correctly but at the wrong time in the articles [73338]. (d) value: The failure of the system in this incident is related to performing its intended functions incorrectly by providing recommendations that could potentially harm patients, such as suggesting a drug that can result in severe or fatal hemorrhage to a patient with severe bleeding [73338]. (e) byzantine: The incident does not exhibit characteristics of a byzantine failure where the system behaves erroneously with inconsistent responses and interactions in the articles [73338]. (f) other: The other behavior exhibited by the system in this incident is related to the way it was trained and the process for building content, which raised serious questions about the underlying technology and the accuracy of the recommendations provided by Watson [73338].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence theoretical_consequence, other (a) death: People lost their lives due to the software failure - No patients were reportedly harmed despite Watson's missteps [73338]. (b) harm: People were physically harmed due to the software failure - No patients were reportedly harmed despite Watson's missteps [73338]. (c) basic: People's access to food or shelter was impacted because of the software failure - Not mentioned in the articles. (d) property: People's material goods, money, or data was impacted due to the software failure - Not mentioned in the articles. (e) delay: People had to postpone an activity due to the software failure - Not mentioned in the articles. (f) non-human: Non-human entities were impacted due to the software failure - Not mentioned in the articles. (g) no_consequence: There were no real observed consequences of the software failure - No patients were reportedly harmed despite Watson's missteps [73338]. (h) theoretical_consequence: There were potential consequences discussed of the software failure that did not occur - The suggestion of administering a drug that can result in severe or fatal hemorrhage to a patient with lung cancer and internal bleeding was hypothetical and part of system testing, not actual treatment [73338]. (i) other: Was there consequence(s) of the software failure not described in the (a to h) options? What is the other consequence(s)? - The software failure incident led to concerns about the accuracy and safety of treatment recommendations made by IBM's Watson for Oncology system, raising serious questions about the process for building content and the underlying technology [73338].
Domain health (a) The failed system, IBM's Watson for Oncology, was intended to support the healthcare industry specifically in diagnosing and recommending treatments for cancer patients [73338]. The system analyzed medical images and patient records to assist doctors in diagnosing breast, lung, colorectal, cervical, ovarian, gastric, and prostate cancers at various hospitals worldwide.

Sources

Back to List