Incident: Child Mental Health Chatbots Fail to Detect Serious Issues

Published Date: 2018-12-10

Postmortem Analysis
Timeline 1. The software failure incident involving the mental health chatbot apps Wysa and Woebot failing to spot reports of child sexual abuse happened in December 2018 as per the publication date of the article [79245].
System 1. Woebot and Wysa mental health chatbot apps 2. Automated systems designed to flag up serious or dangerous situations 3. Woebot app version prior to the update that introduced an 18+ age limit [79245]
Responsible Organization 1. Woebot Labs - The software failure incident involving the mental health chatbot Woebot was caused by its inability to handle reports of child sexual abuse and other serious issues [79245]. 2. Touchkin - The software failure incident involving the mental health chatbot Wysa was also caused by its shortcomings in recognizing and addressing issues like child sexual abuse, eating disorders, and drug use [79245].
Impacted Organization 1. Children's Commissioner for England [79245] 2. NHS Trust [79245] 3. North East London NHS Foundation Trust [79245] 4. Association of Child Psychotherapists [79245]
Software Causes 1. Lack of capability to handle reports of child sexual abuse, eating disorders, and drug use by the mental health chatbot apps Woebot and Wysa [79245]. 2. Inability to identify and flag serious or dangerous situations such as illegal acts or distress signals by the automated systems of the chatbot apps [79245]. 3. Failure to appropriately respond to messages indicating self-harm, coercive sex, eating disorders, and drug use, showcasing a lack of effective detection and intervention mechanisms [79245].
Non-software Causes 1. Lack of appropriate training or algorithms to detect and handle reports of child sexual abuse, eating disorders, and drug use by the chatbot apps [79245]. 2. Insufficient understanding or appreciation of the severity of the complaints by the chatbot apps, leading to inappropriate responses [79245]. 3. Failure to meet legal requirements and obligations in cases where a young person discloses significant risks of harm to themselves or others [79245].
Impacts 1. The mental health chatbot apps, Wysa and Woebot, failed to properly handle reports of child sexual abuse, eating disorders, and drug use, leading to a lack of appropriate responses and interventions [79245]. 2. The flaws in the chatbots meant that they were not considered "fit for purpose" for use by youngsters, as they were unable to recognize and flag clear breaches of law or safeguarding of children [79245]. 3. Woebot's makers introduced an 18+ age limit for their product, and both apps required updates to improve their responses to serious or dangerous situations [79245]. 4. Despite the shortcomings, the chatbots were able to flag messages suggesting self-harm and direct users to emergency services and helplines [79245].
Preventions 1. Implementing robust testing procedures specifically focused on handling sensitive and critical issues such as child sexual abuse, eating disorders, and drug use [79245]. 2. Incorporating more advanced natural language processing algorithms to better detect and respond to distress signals in user messages [79245]. 3. Providing continuous training and updates to the chatbot algorithms to improve their ability to recognize and appropriately respond to high-risk situations [79245]. 4. Establishing clear guidelines and protocols for when the chatbots should escalate issues to human intervention, especially in cases of potential harm to users [79245].
Fixes 1. Implementing robust algorithms and machine learning models to better detect and handle reports of child sexual abuse, eating disorders, drug use, and other serious issues [79245]. 2. Conducting thorough testing and clinical trials to ensure the chatbot apps can effectively identify and respond to signs of distress and potential harm [79245]. 3. Updating the chatbot apps with improved responses and features to address the identified shortcomings, such as introducing age limits, enhancing crisis detection capabilities, and providing appropriate guidance for users in crisis situations [79245].
References 1. The Children's Commissioner for England [79245] 2. NHS Trust [79245] 3. Woebot's creators [79245] 4. Touchkin, the firm behind Wysa [79245] 5. Association of Child Psychotherapists [79245]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident having happened again at one_organization: The incident of software failure has happened again with Woebot, as it required updates after struggling to handle reports of child sexual abuse. The app had been rated suitable for children but failed to recognize and flag clear breaches of law or safeguarding of children. As a result of the probe, Woebot's makers introduced an 18+ age limit for their product [79245]. (b) The software failure incident having happened again at multiple_organization: There is no specific mention in the articles about the software failure incident happening again at multiple organizations.
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase is evident in the articles. The chatbot apps, Woebot and Wysa, failed to properly handle reports of child sexual abuse, eating disorders, and drug use. Despite being rated suitable for children, these apps did not recognize and flag clear breaches of law or safeguarding of children, indicating a design flaw in their systems [79245]. (b) The software failure incident related to the operation phase is also apparent in the articles. Both Woebot and Wysa, designed to assist with mental health issues, failed to appropriately respond to serious situations such as reports of child sexual abuse, self-harm, eating disorders, and drug use. The operational failure was highlighted by the apps' inability to provide adequate responses and interventions when faced with distressing messages from users [79245].
Boundary (Internal/External) within_system (a) within_system: The software failure incident reported in the articles is primarily within the system. The chatbot apps, Woebot and Wysa, failed to properly handle reports of child sexual abuse, eating disorders, and drug use despite being designed to flag serious or dangerous situations. The failures were related to the apps' inability to detect obvious signs of distress and appropriately respond to critical issues raised by users. For example, when testers mentioned being forced to have sex at a young age, the chatbots responded inadequately, showing a lack of capability within the system to address such serious concerns [79245].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in the articles can be attributed to non-human actions, specifically the limitations and shortcomings of the mental health chatbot apps Woebot and Wysa. These chatbots failed to properly handle reports of child sexual abuse, eating disorders, and drug use despite being designed to flag serious or dangerous situations automatically. The failures were evident in the chatbot responses to distressing messages, where they provided inadequate or inappropriate feedback, such as not recognizing clear signs of distress or illegal activities [79245]. (b) On the other hand, human actions also played a role in the software failure incident. The developers of Woebot and Wysa were responsible for introducing updates to address the deficiencies in their apps after the issues were highlighted. For example, Woebot's makers introduced an 18+ age limit for their product and updated their software to account for phrases used in the BBC tests. Similarly, Touchkin, the firm behind Wysa, mentioned that they were updating their app to better handle situations involving coercive sex, illegal drugs, and eating disorders. These actions indicate that human intervention was necessary to rectify the shortcomings in the chatbot apps [79245].
Dimension (Hardware/Software) software (a) The articles do not mention any software failure incident occurring due to contributing factors originating in hardware. Therefore, there is no information available regarding a hardware-related failure incident. (b) The software failure incident reported in the articles is related to the inability of mental health chatbot apps, Wysa and Woebot, to effectively handle reports of child sexual abuse, eating disorders, and drug use. The failures were primarily due to shortcomings in the software's algorithms and responses, which led to inappropriate or inadequate reactions to serious situations reported by users [79245].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident reported in Article 79245 is non-malicious. The failure was due to the chatbot apps, Woebot and Wysa, struggling to handle reports of child sexual abuse, eating disorders, and drug use. The apps failed to properly identify and respond to serious and dangerous situations, indicating a lack of capability rather than intentional harm to the system [79245].
Intent (Poor/Accidental Decisions) poor_decisions (a) The intent of the software failure incident was poor_decisions. The failure was due to poor decisions made in the design and implementation of the mental health chatbot apps, Woebot and Wysa. These apps struggled to handle reports of child sexual abuse, eating disorders, and drug use. They failed to recognize and flag serious or dangerous situations, such as clear breaches of law or safeguarding of children. The apps' responses to distressing messages were inadequate and inappropriate, showing a lack of understanding of the severity of the situations being described. Despite being recommended for children and teenagers, the apps' inability to effectively address critical issues led to updates and age restrictions being implemented to mitigate the risks associated with their use [79245].
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident related to development incompetence is evident in the case of the mental health chatbot apps, Woebot and Wysa. These apps failed to properly handle reports of child sexual abuse, eating disorders, and drug use. Despite being rated suitable for children and recommended by an NHS Trust, the chatbots did not recognize and flag clear breaches of law or safeguarding of children, indicating a lack of professional competence in designing the systems to handle such critical situations [79245]. (b) The software failure incident related to accidental factors is seen in the inability of the chatbot apps to effectively identify and respond to distress signals and serious situations. The responses generated by the chatbots when presented with scenarios of child sexual abuse, self-harm, and illegal activities were inadequate and sometimes inappropriate, indicating accidental shortcomings in the design and programming of the chatbot algorithms [79245].
Duration permanent (a) The software failure incident in the articles seems to be more of a permanent nature. Both mental health chatbot apps, Woebot and Wysa, required updates after struggling to handle reports of child sexual abuse, dealing with eating disorders, and drug use. Woebot even introduced an 18+ age limit for their product as a result of the probe, indicating a permanent change in their approach to handling such sensitive issues [79245].
Behaviour omission, value, other (a) crash: The articles do not mention any instances of the software crashing. (b) omission: The software failure incident in the articles is related to omission. Both mental health chatbot apps, Woebot and Wysa, failed to properly handle reports of child sexual abuse, eating disorders, and drug use. They omitted to perform their intended functions of recognizing and flagging serious or dangerous situations, such as clear breaches of law or safeguarding of children [79245]. (c) timing: The articles do not mention any instances of the software performing its intended functions correctly but too late or too early. (d) value: The software failure incident in the articles is related to value. The chatbot apps provided incorrect and inadequate responses to serious situations, such as suggesting rewriting negative thoughts instead of addressing the severity of the reported issues like being forced into sex at a young age or having eating disorders [79245]. (e) byzantine: The articles do not mention the software behaving with inconsistent responses and interactions. (f) other: The other behavior exhibited by the software in this failure incident is the inability to detect and appropriately respond to critical and distressing situations, leading to a lack of proper intervention and support for users in need [79245].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence non-human, other (a) death: People lost their lives due to the software failure - No information about people losing their lives due to the software failure was mentioned in the articles [79245]. (b) harm: People were physically harmed due to the software failure - The software failure incident did not directly result in physical harm to individuals [79245]. (c) basic: People's access to food or shelter was impacted because of the software failure - The software failure did not impact people's access to food or shelter [79245]. (d) property: People's material goods, money, or data was impacted due to the software failure - The software failure did not directly impact people's material goods, money, or data [79245]. (e) delay: People had to postpone an activity due to the software failure - The software failure did not lead to any activities being postponed [79245]. (f) non-human: Non-human entities were impacted due to the software failure - The software failure incident involved mental health chatbot apps, Woebot and Wysa, struggling to handle reports of child sexual abuse and other issues [79245]. (g) no_consequence: There were no real observed consequences of the software failure - The software failure incident had real observed consequences related to the chatbots' inability to effectively handle serious issues like child sexual abuse, eating disorders, and drug use [79245]. (h) theoretical_consequence: There were potential consequences discussed of the software failure that did not occur - The potential consequences discussed included the chatbots not being able to recognize and flag clear breaches of law or safeguarding of children, which were actual observed consequences [79245]. (i) other: Was there consequence(s) of the software failure not described in the (a to h) options? What is the other consequence(s)? - The software failure incident resulted in the chatbots failing to appropriately respond to serious issues like child sexual abuse, eating disorders, and drug use, highlighting a significant flaw in their functionality [79245].
Domain health (a) The failed system was related to the health industry as it involved mental health chatbot apps designed to assist with relationships, grief, addiction, stress, anxiety, and sleep loss [79245]. The chatbots were intended to provide mental health support to users, particularly youngsters, by allowing them to discuss their concerns with a computer rather than a human.

Sources

Back to List