Incident: Facebook SDK Issue Causes Widespread App Crashes on iOS.

Published Date: 2020-05-07

Postmortem Analysis
Timeline 1. The software failure incident happened on May 6, 2020 [Article 100013]. 2. The software failure incident happened on July 10, 2020 [Article 102514].
System 1. Facebook's iOS SDK [100013, 102514] 2. Facebook's servers [102514]
Responsible Organization 1. Facebook [100013, 102514] 2. Facebook's iOS SDK [100013, 102514]
Impacted Organization 1. iOS developer Clay Jones and other developers using the Facebook SDK [Article 100013] 2. Prominent iOS apps like TikTok, Spotify, Pinterest, Venmo, and more [Article 100013] 3. Apps using the Facebook SDK for login functionality, such as Spotify, Waze, Tinder, MailOnline, Bumble, and Pinterest [Article 102514]
Software Causes 1. The software failure incident was caused by a change in a Facebook server value that impacted apps using the Facebook iOS SDK, leading to crashes in various iOS apps [100013]. 2. The crashing was caused by a problem with how Facebook's servers sent data back to the app in a different format, making the communication incompatible and resulting in the apps receiving gibberish data, leading to fatal errors and app crashes [102514].
Non-software Causes 1. The issue was caused by a problem with how Facebook's servers sent data back to the app, resulting in an incompatible 'language' (Article 102514).
Impacts 1. Many prominent iOS apps like TikTok, Spotify, Pinterest, Venmo, MailOnline, Waze, Tinder, and others experienced crashes and failed to open, affecting a large number of users [100013, 102514]. 2. Users found that their apps crashed whenever they tried to open them, regardless of whether they used Facebook to log in or not [100013]. 3. The software failure incident caused inconvenience to users and developers, highlighting the interconnected nature of the internet and the reliance on SDKs from companies like Facebook [100013, 102514]. 4. The issue with Facebook's SDK led to apps receiving incompatible data from the server, resulting in crashes and fatal errors that forced the entire app to close [102514].
Preventions 1. Implementing thorough testing procedures before deploying changes to the SDK could have prevented the software failure incident [100013, 102514]. 2. Providing clear and detailed documentation on any changes made to the SDK could have helped developers anticipate and mitigate potential issues [100013]. 3. Conducting regular code reviews and audits of the SDK to ensure compatibility and prevent unexpected crashes [100013, 102514]. 4. Offering developers the option to disable certain events or customize the SDK's behavior to better suit their app's requirements could have mitigated the impact of the incident [100013, 102514]. 5. Enhancing communication channels between Facebook and app developers to quickly address and resolve issues that arise from SDK updates [100013, 102514].
Fixes 1. The software failure incident caused by the Facebook SDK issue could be fixed by Facebook resolving the problem with their servers and ensuring that data is sent back to client apps in a compatible format to prevent crashes [102514]. 2. App developers could implement updates or patches to their apps to handle the data format issue caused by the Facebook SDK, thereby preventing crashes and fatal errors [102514]. 3. Facebook could provide clearer documentation and guidance on how developers can disable certain events or customize the SDK to better suit their app's needs, reducing the likelihood of unexpected crashes [100013]. 4. Developers could conduct thorough testing and review of third-party SDKs like Facebook's to identify potential issues before integrating them into their apps, minimizing the risk of widespread crashes [100013]. 5. Increased transparency from companies like Facebook regarding the behavior and data transmission of their SDKs could help developers make more informed decisions about using such tools and mitigate unexpected software failures [100013, 102514].
References 1. iOS developer Clay Jones [Article 100013] 2. Facebook's statement [Article 100013] 3. iOS developer Steven Troughton-Smith [Article 100013] 4. iOS security researcher Will Strafach [Article 100013] 5. Facebook spokesperson Tom Parnell [Article 100013] 6. Facebook [Article 102514]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident having happened again at one_organization: - Facebook experienced a similar software failure incident in the past related to its SDK causing apps to crash [100013]. - In late April, the Google Maps SDK had an issue that caused apps to crash on opening, indicating a recurring problem with SDKs [100013]. (b) The software failure incident having happened again at multiple_organization: - The recent software failure incident affected various apps beyond Facebook, such as Spotify, Pinterest, Venmo, TikTok, and more, due to the widespread use of the Facebook SDK [100013]. - The issue with the Facebook SDK causing apps to crash has also impacted apps like MailOnline, Bumble, and Tinder, indicating a broader impact across different organizations [102514].
Phase (Design/Operation) design (a) The software failure incident reported in the articles can be attributed to the design phase. The incident was caused by a change in a server value in the Facebook SDK, which led to crashes in various apps that used the SDK for functionalities like login with Facebook [100013]. The change in the server value, which altered the way data was communicated between the server and the client app, resulted in crashes and errors in multiple apps [102514]. This highlights how a small change in the design of a software development kit can have a significant impact on the functioning of various apps that rely on it.
Boundary (Internal/External) within_system, outside_system (a) within_system: The software failure incident discussed in the articles was primarily due to contributing factors that originated from within the system. Specifically, the issue was related to a problem with the Facebook SDK, which is used by various apps for functionalities like login with Facebook. The change in a server value within the Facebook SDK caused crashes in multiple apps, even those not directly related to Facebook login [100013, 102514]. (b) outside_system: The software failure incident was also influenced by contributing factors that originated from outside the system. In this case, the issue stemmed from how Facebook's servers were sending data back to the apps using the Facebook SDK. The incompatible format in which the data was sent back from the server to the client apps led to crashes and fatal errors within the apps, ultimately causing them to close [102514].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident occurring due to non-human actions: - The software failure incident in Article 100013 was caused by a small change in a server value in the Facebook SDK, which led to crashes in various iOS apps [100013]. - In Article 102514, the crashing of apps was attributed to a problem with how Facebook's servers were sending data back to the apps, resulting in an incompatible 'language' causing the apps to crash [102514]. (b) The software failure incident occurring due to human actions: - The incident in Article 100013 was triggered by a change made in a Facebook release that impacted apps using the Facebook iOS SDK, highlighting the impact of human actions on software failures [100013]. - Article 102514 mentions that the crashing of apps was caused by a problem with how Facebook's servers were sending data back to the apps, indicating a potential human error in the data communication process [102514].
Dimension (Hardware/Software) software (a) The software failure incident occurring due to hardware: - There is no mention of the software failure incident in the provided articles being attributed to hardware issues. Therefore, it is unknown if the incident was caused by hardware-related factors. (b) The software failure incident occurring due to software: - The software failure incident reported in the articles is attributed to software-related factors. Specifically, the incident was caused by a problem with how Facebook's servers sent data back to the app, resulting in an incompatible 'language' that caused apps to crash and trigger fatal errors [100013, 102514].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident described in the articles does not indicate any malicious intent behind the failure. The issues were caused by technical problems within the Facebook SDK, leading to crashes in various apps that use the SDK for functionalities like login options. The failures were not a result of intentional actions to harm the system. [based on Article 100013, Article 102514]
Intent (Poor/Accidental Decisions) poor_decisions (a) The software failure incident described in the articles can be attributed to poor_decisions. The incident was caused by a change in a server value in the Facebook SDK without warning, which led to crashes in various apps using the SDK [100013]. Additionally, the issue was related to how Facebook's servers sent data back to the apps in an incompatible format, causing the apps to crash [102514]. These poor decisions or changes made without proper consideration ultimately resulted in the widespread software failure incident.
Capability (Incompetence/Accidental) development_incompetence (a) The software failure incident occurring due to development incompetence: - Article 100013 mentions a software failure incident where a change in a Facebook server value caused crashes in apps using the Facebook iOS SDK. The change was described as a small change that had a significant impact, indicating a lack of proper testing or consideration of potential consequences before implementing the change [100013]. (b) The software failure incident occurring accidentally: - Article 102514 reports on apps crashing due to a problem with how Facebook's servers send data back to the app. The issue was described as a communication problem between the server and the client app, leading to the app receiving incompatible data and crashing as a result. This indicates an accidental introduction of a communication error that caused the failure [102514].
Duration temporary (a) The software failure incident described in the articles was temporary. The incident caused apps to crash and fail to open, but it was not a permanent failure. Facebook acknowledged the issue with its SDK causing apps to crash, and the problem was eventually resolved. For example, in Article 100013, it is mentioned that Facebook quickly identified and resolved the issue that triggered crashes in apps using the Facebook iOS SDK. Additionally, in Article 102514, Facebook reported that the crashing issue spiked and receded, with a few "aftershocks," indicating that the problem was not permanent and was being monitored for resolution. Furthermore, the articles highlight that the crashing was caused by a problem with how Facebook's servers sent data back to the apps, resulting in an incompatible "language" that led to crashes. This issue was specific to certain circumstances related to the communication between the server and the client apps, rather than being a permanent failure affecting all circumstances.
Behaviour crash (a) crash: The software failure incident described in the articles is related to a crash. Both articles [100013, 102514] mention that various apps, including prominent ones like TikTok, Spotify, Pinterest, and more, experienced crashes when users tried to open them. The crashes were caused by a problem with the Facebook SDK, leading to the apps not functioning as intended and ultimately crashing. (b) omission: There is no specific mention of the software failure incident being related to omission in the articles. (c) timing: The incident is not related to timing issues where the system performs its intended functions but at the wrong time. (d) value: The failure is not due to the system performing its intended functions incorrectly. (e) byzantine: The incident does not involve the system behaving erroneously with inconsistent responses and interactions. (f) other: The behavior of the software failure incident is related to a crash caused by a problem with the Facebook SDK, leading to widespread app crashes as described in the articles [100013, 102514].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence unknown (a) death: People lost their lives due to the software failure (b) harm: People were physically harmed due to the software failure (c) basic: People's access to food or shelter was impacted because of the software failure (d) property: People's material goods, money, or data was impacted due to the software failure (e) delay: People had to postpone an activity due to the software failure (f) non-human: Non-human entities were impacted due to the software failure (g) no_consequence: There were no real observed consequences of the software failure (h) theoretical_consequence: There were potential consequences discussed of the software failure that did not occur (i) other: Was there consequence(s) of the software failure not described in the (a to h) options? What is the other consequence(s)? The articles do not mention any consequences related to death, physical harm, impact on access to food or shelter, impact on material goods, money, or data, or any delays caused by the software failure incident. There were also no observed non-human impacts or theoretical consequences discussed. The incident primarily caused app crashes and inconvenience to users, but no severe consequences as mentioned in the options.
Domain entertainment (a) The software failure incident affected the entertainment industry as prominent iOS apps like TikTok, Spotify, Pinterest, Venmo, and more experienced issues due to a flaw in Facebook's software development kit [100013]. (k) The entertainment industry was further impacted as apps like Spotify, Waze, Tinder, MailOnline, and TikTok, which are part of the entertainment sector, also faced crashes and failures due to the issue with Facebook's SDK [102514].

Sources

Back to List