Incident: Meta's Media-Matching Technology Bug Led to Content Removal Failure

Published Date: 2022-05-17

Postmortem Analysis
Timeline 1. The software failure incident with Facebook parent company Meta's media-matching technology bug happened in the first three months of this year [128067]. 2. The article was published on 2022-05-17. 3. Therefore, the software failure incident occurred in the first quarter of 2022.
System 1. Meta's media-matching technology [128067]
Responsible Organization 1. Meta's media-matching technology [128067]
Impacted Organization 1. Users on Facebook and Instagram were impacted by the software failure incident [128067].
Software Causes 1. The failure incident was caused by a bug in Meta's media-matching technology, leading to content being mistakenly flagged and pulled down [128067].
Non-software Causes 1. Lack of timely removal of terrorism content before it went viral on social networks like Facebook and Twitter [Article 128067] 2. Challenges in detecting and removing new versions or links of violating content to evade enforcement by social media platforms [Article 128067]
Impacts 1. Content that didn't violate Facebook's rules was mistakenly pulled down, leading to the removal of posts that were incorrectly flagged for violating rules against terrorism and organized hate [128067]. 2. Meta's media-matching technology bug resulted in the removal of a significant amount of content, including pieces flagged for organized hate, terrorism, suicide and self-injury, and sexual exploitation [128067]. 3. The bug raised concerns about the effectiveness of Meta's automated technology and highlighted the need for more robust content moderation processes to prevent similar errors in the future [128067].
Preventions 1. Implementing more robust testing procedures to catch bugs in the media-matching technology before deployment [128067]. 2. Enhancing the AI technology to better distinguish between false positives and actual violations, possibly through more advanced machine learning algorithms [128067]. 3. Providing more advanced warning to users before penalizing them for rule violations to reduce the impact of false positives [128067].
Fixes 1. Implement more rigorous testing procedures for new technology introduced into the media-matching system to prevent false positives from occurring [128067]. 2. Develop and deploy new AI technology that learns from appeals and restored content to improve the accuracy of content moderation [128067]. 3. Experiment with providing users with advanced warning before penalizing them for rule violations to reduce the impact of false positives on content removal [128067].
References 1. Meta's quarterly report [128067] 2. Meta Vice President of Integrity Guy Rosen in a press call [128067]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) The software failure incident related to mistakenly pulling down content due to a bug in Meta's media-matching technology has happened again within the same organization. Meta, the parent company of Facebook, disclosed that the bug led to content being incorrectly flagged for violating its rules, including against terrorism and organized hate. The incident resulted in the removal of content that did not actually violate the platform's rules, and Meta had to restore the affected posts [128067]. (b) The software failure incident involving the bug in Meta's media-matching technology also affected Instagram, another platform owned by Meta. Instagram took action against more terrorism and organized hate content due to the same bug that caused content to be mistakenly flagged for violations. This indicates that the software failure incident had an impact on multiple platforms within the Meta organization [128067].
Phase (Design/Operation) design, operation (a) The software failure incident related to the design phase can be seen in the article where Meta's media-matching technology had a bug that led to content mistakenly being pulled down. This bug was attributed to a spike in actions taken against organized hate and terrorism content, indicating a failure introduced during the system development or updates [128067]. (b) The software failure incident related to the operation phase is evident in the article where Facebook faced scrutiny for not removing terrorism content before it went viral. The delay in removing the livestreamed video of a terrorist attack and hate-filled content highlights a failure introduced during the operation or misuse of the system [128067].
Boundary (Internal/External) within_system, outside_system (a) within_system: The software failure incident related to the bug in Meta's media-matching technology that led to content mistakenly being pulled down originated from within the system itself. Meta acknowledged that the bug introduced false positives into the system, causing a significant amount of content to be incorrectly flagged and removed [128067]. This indicates that the failure was a result of an internal issue within Meta's technology. (b) outside_system: The incident also highlighted challenges faced by Facebook in dealing with terrorism content that originates from outside the system. Despite efforts to remove such content quickly, there were instances where terrorist content spread on social networks like Facebook and Twitter before being addressed by the platform [128067]. This external factor of users creating new versions of the content to evade detection poses a challenge for the system's content moderation efforts.
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident in this case was attributed to a bug in Meta's media-matching technology, which led to content being mistakenly pulled down without human participation. The bug resulted in a significant number of pieces of content being incorrectly flagged for violating the platform's rules, including content related to terrorism, organized hate, suicide, self-injury, and sexual exploitation [128067]. This indicates that the failure was primarily due to non-human actions, specifically the introduction of false positives into the system by the new technology. (b) On the other hand, human actions were also involved in the response to the software failure incident. Meta's Vice President of Integrity, Guy Rosen, mentioned that the company is taking steps to prevent content moderation errors from happening, such as testing new AI technology and providing more advanced warning to users before penalizing them for rule violations [128067]. Additionally, employees at Meta quickly designated a livestreamed video as a terrorist attack and removed any copies of the video after becoming aware of the situation, indicating human intervention in addressing the incident [128067].
Dimension (Hardware/Software) software (a) The software failure incident reported in the article was not attributed to hardware issues but rather to a bug in Meta's media-matching technology. The bug led to content being mistakenly pulled down, affecting various types of content moderation on platforms like Facebook and Instagram [128067]. (b) The software failure incident was specifically attributed to a bug in Meta's media-matching technology, which resulted in content that did not violate the platform's rules being incorrectly flagged and removed. This bug affected the moderation of content related to organized hate, terrorism, suicide and self-injury, and sexual exploitation. Meta acknowledged the bug and took steps to fix it, including restoring the mistakenly flagged content [128067].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident described in the article is non-malicious. The incident was caused by a bug in Meta's media-matching technology that mistakenly flagged content for violating the platform's rules, leading to the removal of legitimate posts [128067]. The bug resulted in a significant number of pieces of content being incorrectly flagged for various violations, such as organized hate, terrorism, suicide, self-injury, and sexual exploitation. The company acknowledged the error and took steps to restore the mistakenly removed content, indicating that the failure was not intentional but rather a result of a technical issue within the system.
Intent (Poor/Accidental Decisions) poor_decisions, accidental_decisions (a) The software failure incident related to the bug in Meta's media-matching technology that led to content being mistakenly pulled down can be attributed to poor decisions. Meta introduced new technology that resulted in false positives being fed into the system, causing a large amount of content to be incorrectly flagged and removed. This indicates that the failure was a result of contributing factors introduced by poor decisions made during the implementation of the new technology [128067]. (b) Additionally, the incident can also be linked to accidental decisions or mistakes. The bug in the media-matching technology led to unintended consequences where content that did not violate the platform's rules was erroneously flagged and removed. This unintended outcome highlights that the failure was also influenced by contributing factors introduced by accidental decisions or mistakes [128067].
Capability (Incompetence/Accidental) accidental (a) The software failure incident in the article was not explicitly attributed to development incompetence. However, the incident highlighted the presence of a bug in Meta's media-matching technology that led to content being mistakenly flagged and taken down, indicating a failure in the system's automated content moderation process [128067]. (b) The software failure incident in the article was categorized as accidental. Meta acknowledged that the bug in its media-matching technology resulted in content being mistakenly removed, including posts that did not violate the platform's rules. This accidental failure led to the incorrect flagging and removal of a significant amount of content, prompting Meta to take steps to prevent such errors in the future [128067].
Duration temporary (a) The software failure incident described in the articles seems to be temporary. The incident was caused by a bug in Meta's media-matching technology that led to content being mistakenly pulled down. Meta acknowledged the bug, fixed the problem, and restored the incorrectly flagged posts [128067]. This indicates that the failure was not permanent but rather a temporary issue caused by specific circumstances (the bug in the technology).
Behaviour crash, omission, timing, value, other (a) crash: The software failure incident in the article can be categorized as a crash as it led to content being mistakenly pulled down, indicating a failure in the system's intended functions [128067]. (b) omission: The incident also involved omission as the system omitted to perform its intended function of correctly identifying and flagging content that violated the platform's rules, resulting in the incorrect removal of posts [128067]. (c) timing: While the incident did not specifically mention timing issues, it can be inferred that there were timing failures as the system took action against content too late (after the bug had caused incorrect flagging and removal) [128067]. (d) value: The software failure incident falls under the category of value failure as the system performed its intended functions incorrectly by mistakenly flagging and removing content that did not violate the platform's rules [128067]. (e) byzantine: The incident does not exhibit characteristics of a byzantine failure where the system behaves erroneously with inconsistent responses and interactions. Instead, the focus is on a bug leading to incorrect actions by the system [128067]. (f) other: The other behavior exhibited by the system in this incident is the propagation of false positives due to the bug in the media-matching technology, resulting in the system pulling down a large amount of content that did not violate the platform's rules [128067].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence theoretical_consequence (a) unknown (b) unknown (c) unknown (d) unknown (e) unknown (f) unknown (g) no_consequence (h) theoretical_consequence (i) harm: The software failure incident did not directly result in harm to individuals, but it did lead to content being mistakenly pulled down, including content related to terrorism and organized hate. This could potentially have indirect consequences in terms of misinformation or lack of access to important information [128067].
Domain information (a) The software failure incident reported in Article 128067 is related to the information industry. Facebook parent company Meta's media-matching technology had a bug that mistakenly pulled down content that didn't violate its rules, including content related to terrorism and organized hate [128067]. This incident impacted the production and distribution of information on social media platforms like Facebook and Instagram.

Sources

Back to List