Incident: macOS Trustd Failure Causing Systemwide Slowdowns and App Launch Issues

Published Date: 2020-11-14

Postmortem Analysis
Timeline 1. The software failure incident affecting Mac users, including issues with apps taking minutes to launch, occurred close to the time when Apple began rolling out the new version of macOS Big Sur [107191]. 2. Published on 2020-11-14 08:00:00+00:00. 3. The software failure incident likely happened in November 2020.
System 1. trustd macOS process 2. Apple's OCSP service 3. App notarization system 4. macOS Big Sur, Catalina, and Mojave versions [107191]
Responsible Organization 1. Apple [107191]
Impacted Organization 1. Mac users 2. Apple services such as Apple Pay, Messages, and Apple TV devices [107191]
Software Causes 1. The software causes of the failure incident included issues with the trustd macOS process failing to contact Apple's servers for app notarization validation, leading to systemwide slowdowns and unresponsiveness [107191].
Non-software Causes 1. The failure incident was exacerbated by the launch of the new version of macOS, Big Sur, by Apple [107191].
Impacts 1. Mac users experienced issues such as apps taking minutes to launch, stuttering, and unresponsiveness throughout macOS, affecting users of various macOS versions like Big Sur, Catalina, and Mojave [107191]. 2. Apple services like Apple Pay, Messages, and Apple TV devices faced slowdowns, outages, and odd behavior [107191]. 3. The trustd macOS process responsible for app notarization was attempting to contact a host named ocsp.apple.com but failing repeatedly, resulting in systemwide slowdowns as apps attempted to launch [107191]. 4. Users encountered numerous successive errors related to trustd when filtering errors in Console, indicating the impact on user experience and system performance [107191]. 5. The software failure incident led to temporary workarounds being circulated on forums, chat rooms, and Twitter until the problem behavior eventually cleared as Apple presumably resolved the underlying issue [107191].
Preventions 1. Implementing thorough testing procedures before rolling out a new version of macOS, such as Big Sur, to identify and address potential issues prior to release [107191]. 2. Ensuring proper monitoring and alerting systems are in place to quickly identify and respond to any anomalies or failures in critical processes like the trustd macOS process responsible for app notarization [107191]. 3. Improving the resilience of the system to handle network connectivity issues more gracefully, such as implementing better error handling mechanisms to prevent systemwide slowdowns when unable to connect to necessary services like ocsp.apple.com [107191].
Fixes 1. Apple resolving the underlying issue with the trustd process contacting ocsp.apple.com [107191]
References 1. Mac users experiencing unexpected issues with apps and systemwide slowdowns [Article 107191] 2. Trustd macOS process attempting to contact ocsp.apple.com and failing repeatedly [Article 107191] 3. Apple's notarization process and certificate validation issues [Article 107191] 4. Temporary workarounds shared on forums, chat rooms, and Twitter [Article 107191] 5. Apple's announcement of Big Sur launch and timing of the software issues [Article 107191]

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization, multiple_organization (a) The software failure incident related to Apple's macOS issues with apps taking minutes to launch, stuttering, and unresponsiveness occurred due to trustd attempting to contact a host named ocsp.apple.com but failing repeatedly [107191]. This incident is specific to Apple and its products and services. (b) The incident also affected other Apple services like Apple Pay, Messages, and Apple TV devices, indicating a broader impact beyond just macOS [107191]. This suggests that similar issues may have occurred across multiple Apple services, not just limited to macOS.
Phase (Design/Operation) design, operation (a) The software failure incident described in the article is related to the design phase. The issues experienced by Mac users, such as apps taking minutes to launch, systemwide slowdowns, and unresponsiveness, were attributed to a macOS process called trustd attempting to contact a host named ocsp.apple.com but failing repeatedly. This failure in the design of the system, specifically in the process responsible for checking app notarization, led to the systemwide issues users encountered [107191]. (b) Additionally, the article mentions that the software failure incident affected Apple services like Apple Pay, Messages, and Apple TV devices, indicating operational issues beyond just the macOS system itself. These operational issues could be related to the operation or misuse of the system, contributing to the overall impact of the software failure incident [107191].
Boundary (Internal/External) within_system (a) The software failure incident described in the article is primarily within the system. The issues experienced by Mac users, such as apps taking minutes to launch, systemwide slowdowns, and unresponsiveness, were caused by a macOS process called trustd attempting to contact a host named ocsp.apple.com but failing repeatedly. This internal process responsible for checking app notarization with Apple's servers led to the systemwide slowdowns and other problems [107191].
Nature (Human/Non-human) non-human_actions (a) The software failure incident in the article was primarily due to non-human actions. The issues experienced by Mac users, such as apps taking minutes to launch, stuttering, and unresponsiveness, were linked to a macOS process called trustd attempting to contact a host named ocsp.apple.com but failing repeatedly. This resulted in systemwide slowdowns as apps tried to launch, indicating a failure caused by factors introduced without human participation [107191]. (b) The article does not mention any contributing factors introduced by human actions that led to the software failure incident.
Dimension (Hardware/Software) software (a) The software failure incident reported in Article 107191 was not attributed to hardware issues but rather to software-related factors. The incident was linked to a macOS process called trustd attempting to contact a host named ocsp.apple.com but failing repeatedly, resulting in systemwide slowdowns and issues with app launches [107191]. This indicates that the root cause of the failure was related to software processes and interactions rather than hardware malfunctions.
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident described in the article is non-malicious. The issues experienced by Mac users, such as apps taking minutes to launch, systemwide slowdowns, and unresponsiveness, were a result of a failure in the trustd macOS process attempting to contact Apple's servers for app notarization validation. This failure led to systemwide slowdowns and odd behavior across various Apple services [107191].
Intent (Poor/Accidental Decisions) accidental_decisions [a] The software failure incident described in Article 107191 was not due to poor decisions but rather an accidental issue related to the OCSP (Online Certificate Status Protocol) service failing to connect properly, leading to systemwide slowdowns and app launch issues on macOS. This was not a result of poor decisions but rather an unintended consequence of the OCSP service not functioning as expected [107191].
Capability (Incompetence/Accidental) accidental (a) The software failure incident reported in Article 107191 was not attributed to development incompetence. The issues experienced by Mac users, such as apps taking minutes to launch, stuttering, and unresponsiveness, were related to a systemwide slowdown caused by a macOS process called trustd repeatedly failing to contact Apple's servers for app notarization validation. (b) The software failure incident in Article 107191 was accidental in nature. The slowdowns and unresponsiveness were unintended consequences of the trustd process failing to connect to the necessary servers for app notarization validation, leading to widespread issues for users of macOS versions like Big Sur, Catalina, and Mojave.
Duration temporary The software failure incident described in Article 107191 was temporary. Users began experiencing unexpected issues, such as apps taking minutes to launch, systemwide slowdowns, and unresponsiveness throughout macOS, around the time Apple rolled out the new version of macOS Big Sur. The issues affected users of various macOS versions, including Catalina and Mojave. The problems lasted for several minutes, and while some temporary workarounds circulated, the problem eventually cleared as Apple presumably resolved the underlying issue [107191].
Behaviour crash, omission, value, other (a) crash: The software failure incident described in the article can be categorized as a crash. Users experienced systemwide slowdowns, apps taking minutes to launch, stuttering, and unresponsiveness throughout macOS [107191]. (b) omission: The incident can also be categorized as an omission. Users encountered issues where the trustd process responsible for checking with Apple's servers to confirm app notarization was attempting to contact a host named ocsp.apple.com but failing repeatedly, leading to systemwide slowdowns and errors [107191]. (c) timing: The timing of the software failure incident is significant as it coincided with the rollout of the new version of macOS, Big Sur. The problems began almost precisely in time with the launch of Big Sur, indicating a timing-related failure [107191]. (d) value: The incident can be categorized as a value failure as the system was performing its intended function of notarization validation incorrectly. Instead of soft-failing when unable to connect to the network, calls to the server appeared to simply hang, causing the system to behave incorrectly [107191]. (e) byzantine: The behavior of the software failure incident does not align with a byzantine failure. The incident did not involve inconsistent responses or interactions but rather a specific issue with the notarization process and server connectivity [107191]. (f) other: The other behavior exhibited in this software failure incident is the temporary workarounds that circulated on forums, chat rooms, and Twitter in response to the problem behavior. Users attempted to find solutions to the issues caused by the failure, indicating a proactive response from the community [107191].

IoT System Layer

Layer Option Rationale
Perception None None
Communication None None
Application None None

Other Details

Category Option Rationale
Consequence property (d) property: People's material goods, money, or data was impacted due to the software failure The software failure incident described in the article caused issues for Mac users, including apps taking minutes to launch, stuttering and unresponsiveness throughout macOS, and other problems. Additionally, Apple services such as Apple Pay, Messages, and Apple TV devices faced slowdowns, outages, and odd behavior [107191]. This indicates that users' access to their data, services, and potentially financial transactions were impacted by the software failure.
Domain information (a) The software failure incident reported in Article 107191 affected Mac users who were experiencing unexpected issues with their macOS systems, including apps taking minutes to launch, stuttering, unresponsiveness, and other problems related to the operating system [107191]. This incident primarily impacted the information industry, as it disrupted the production and distribution of information through Apple devices running macOS.

Sources

Back to List