Incident: New York City Wireless Network (NYCWiN) Failure Incident - GPS Rollover Impact

Published Date: 2019-04-10

Postmortem Analysis
Timeline 1. The software failure incident happened in April 2019. [83649, 85646]
System 1. NYCWiN (New York City Wireless Network) [83649, 85646] 2. Police Department license plate readers [83649, 85646] 3. Traffic lights programming system [83649, 85646] 4. Department of Transportation's Traffic Management Center [83649] 5. Real Time Traffic Information webpage [83649] 6. CityTime (city payroll system) [83649] 7. Systems used by the departments of health, sanitation, and parks [83649]
Responsible Organization 1. The Department of Information Technology and Telecommunications (DoiTT) was responsible for causing the software failure incident as they were in charge of operating the wireless network that crashed [Article 85646]. 2. Northrop Grumman, the contractor responsible for maintaining and operating the network, also played a role in the failure incident by not alerting city officials to the need for an upgrade ahead of the GPS rollover event [Article 85646].
Impacted Organization 1. New York City Wireless Network (NYCWiN) [83649, 85646] 2. New York City agencies, including the Police Department, Department of Transportation, Department of Sanitation, and Department of Parks [83649, 85646]
Software Causes 1. Lack of preparation for the GPS rollover event, leading to a failure in the wireless network used by city agencies [Article 83649, Article 85646] 2. Failure to upgrade the firmware in the nodes of the network to handle the rollover event [Article 85646]
Non-software Causes 1. Lack of awareness and preparation for the GPS rollover event by New York City's technology managers [Article 85646]. 2. Confusion, poor communication, and lack of coordination in response to the network failure [Article 85646]. 3. Failure to upgrade the firmware in the nodes of the wireless network despite the known need for upgrades [Article 85646].
Impacts 1. The software failure incident caused the New York City Wireless Network (NYCWiN) to go dark, disrupting numerous city tasks and functions, including the collection and transmission of information from Police Department license plate readers, programming of traffic lights by the Department of Transportation, and connectivity for agencies like sanitation and parks departments [83649]. 2. The shutdown of the wireless network led to service interruptions, such as half of the city-operated signs showing arrival times at bus stops being disabled, about 200 cameras providing online images of traffic conditions going offline, and many other tasks handled by the network requiring manual reassignment of city workers [85646]. 3. The incident resulted in confusion, poor communication, and a lack of coordination in the attempts to restore the network, which took 10 days to get running again [85646]. 4. The disruption affected systems beyond the wireless network, including the city payroll system (CityTime) and systems used by the departments of health, sanitation, and parks [83649]. 5. The failure incident raised questions about the city's preparation for the GPS rollover and highlighted concerns about the maintenance and operation of the NYCWiN network, which cost the city $37 million annually [83649, 85646].
Preventions 1. Proper preparation and awareness of the GPS rollover event by the city's technology managers could have prevented the software failure incident [Article 85646]. 2. Upgrading the firmware in the nodes or antennas that make up the wireless network could have prevented the crash [Article 85646]. 3. Timely communication and coordination among city agencies and officials could have helped in avoiding the chaotic response to the incident [Article 85646]. 4. Proactive measures by Northrop Grumman, the contractor maintaining the network, to alert city officials about the need for upgrades ahead of the rollover could have prevented the software failure incident [Article 85646].
Fixes 1. Upgrading the firmware in the nodes or antennas that make up the wireless network used by city agencies [85646]. 2. Implementing necessary upgrades to systems to avoid interruptions due to the GPS rollover event [85646]. 3. Improving communication, coordination, and preparation for future events that could impact computer networks [85646]. 4. Ensuring that technology managers are aware of potential risks and take proactive measures to prevent software failures [85646]. 5. Conducting a thorough review of the city's technology infrastructure to identify and address potential risks and vulnerabilities [85646].
References 1. City officials, including the Department of Information Technology and Telecommunications and the Department of Homeland Security [Article 83649]. 2. Councilman Brad Lander of Brooklyn [Article 83649]. 3. Northrop Grumman, the contractor responsible for maintaining and operating the network [Article 83649, Article 85646]. 4. Gartner, the consulting firm that compiled a report on the incident [Article 85646]. 5. Laura Anglin, the deputy mayor for operations [Article 83649, Article 85646]. 6. Samir Saini, the former DoiTT commissioner [Article 85646]. 7. Mayor Bill de Blasio [Article 85646]. 8. Northrop Grumman spokesman, Tim Paynter [Article 85646].

Software Taxonomy of Faults

Category Option Rationale
Recurring one_organization (a) In the articles, it is mentioned that the wireless network used by city agencies in New York City, known as NYCWiN, experienced a software failure related to the GPS rollover incident in April. This incident caused disruptions to various city services, including the Police Department license plate readers and the system to remotely control traffic lights [83649, 85646]. (b) The articles do not provide specific information about similar incidents happening at other organizations or with their products and services.
Phase (Design/Operation) design, operation (a) The software failure incident in New York City related to the GPS rollover was primarily due to contributing factors introduced by system development and system updates. The incident occurred because the city's technology managers were caught off guard and did not prepare for the calendar reset of the centralized Global Positioning System, leading to the crash of the wireless network used by city agencies [83649, 85646]. (b) Additionally, the failure was exacerbated by poor communication, confusion, and a lack of coordination during the operation phase when attempts were made to restore the network. The report highlighted that after the network went down, the response was chaotic, and the city faced challenges in getting the network back up and running smoothly [85646].
Boundary (Internal/External) within_system, outside_system (a) within_system: The software failure incident related to the GPS rollover in New York City was primarily within the system. The failure was caused by the city's lack of preparation for the calendar reset of the centralized Global Positioning System, which led to the crash of the wireless network used by city agencies [83649, 85646]. The incident highlighted issues such as poor communication, confusion, and a lack of coordination within the Department of Information Technology and Telecommunications (DoiTT), which was responsible for operating the network [85646]. Additionally, the report on the incident revealed that officials at several city agencies were aware of the GPS rollover, but the DoiTT claimed they were not informed before the event occurred [85646]. (b) outside_system: The failure also had contributing factors originating from outside the system, such as the need for upgrades to software or hardware in anticipation of the GPS rollover. Government and industry notices had encouraged technology managers to upgrade systems to avoid possible interruptions, indicating that external factors like the rollover event were known and could have been mitigated with proper preparation [85646]. The contractor responsible for maintaining and operating the network, Northrop Grumman, was also mentioned in the context of not alerting city officials to the need for an upgrade ahead of the rollover, suggesting an external factor contributing to the failure [85646].
Nature (Human/Non-human) non-human_actions, human_actions (a) The software failure incident occurring due to non-human actions: - The software failure incident in New York City was primarily caused by the GPS rollover, a calendar reset of the centralized Global Positioning System, which was a non-human action [83649, 85646]. - The GPS rollover event, which occurs approximately every 20 years, led to the disruption of the NYCWiN wireless network used by city agencies, including the Police Department license plate readers and traffic light control systems [83649, 85646]. - The failure was exacerbated by the lack of preparation and upgrades to the system to handle the GPS rollover, which was a non-human factor that impacted the network's functionality [83649, 85646]. (b) The software failure incident occurring due to human actions: - The failure was also attributed to human actions, specifically the lack of awareness and preparation by New York City's technology managers for the GPS rollover event despite warnings from federal officials and technology companies [85646]. - The report highlighted confusion, poor communication, and a lack of coordination among city officials after the network went down, indicating human factors contributing to the incident [85646]. - The report mentioned that officials at the Department of Information Technology and Telecommunications claimed they were not aware of the rollover before it occurred, suggesting a lack of proactive human action in preparing for the event [85646].
Dimension (Hardware/Software) hardware, software (a) The software failure incident in New York City related to the GPS rollover was primarily due to hardware issues. The incident was caused by a long-anticipated calendar reset of the centralized Global Positioning System (GPS), which connects to devices and computer networks around the world. This reset affected the NYCWiN wireless network, leading to its shutdown and disrupting various city tasks and functions, such as the collection and transmission of information from Police Department license plate readers and the programming of traffic lights by the Department of Transportation [83649]. (b) The software failure incident in New York City was also influenced by software factors. The failure was exacerbated by the lack of preparation and upgrades in the software or firmware of the wireless network used by city agencies. The report highlighted that the system could have been easily upgraded by replacing the firmware in the nodes that make up the network. Additionally, the lack of awareness and coordination within the Department of Information Technology and Telecommunications (DoiTT) regarding the GPS rollover contributed to the software-related issues that led to the network crash [85646].
Objective (Malicious/Non-malicious) non-malicious (a) The software failure incident related to the GPS rollover in New York City does not appear to be malicious. It was a result of poor preparation and lack of awareness about the calendar reset of the centralized Global Positioning System, leading to the crash of the wireless network used by city agencies [85646]. The incident was attributed to confusion, poor communication, and a lack of coordination in response to the network failure, rather than any intentional harm to the system [85646]. (b) The software failure incident was non-malicious, stemming from a lack of preparation and awareness about the GPS rollover, rather than any deliberate attempt to harm the system [85646]. The failure was a result of inadequate upgrades and poor response to the known event, indicating a non-malicious nature of the incident [85646].
Intent (Poor/Accidental Decisions) poor_decisions, accidental_decisions (a) The software failure incident in New York City related to the GPS rollover can be attributed to poor decisions made by city officials and technology managers. Despite warnings and knowledge about the GPS rollover event, the Department of Information Technology and Telecommunications (DoiTT) responsible for operating the wireless network failed to prepare for the calendar reset of the centralized Global Positioning System [85646]. The report released by the city highlighted poor preparation, confusion, poor communication, and a lack of coordination that hampered the attempts to restore the network, leading to a 10-day outage [85646]. Additionally, the report revealed that officials at several city agencies were aware of the rollover, but the DoiTT claimed they were not informed before the incident occurred [85646]. The lack of awareness and preparation ultimately led to the network crash and subsequent service disruptions. (b) On the other hand, the software failure incident can also be attributed to accidental decisions or mistakes. The report mentioned that the wireless network could have been easily upgraded by replacing the firmware in the nodes, but this necessary upgrade was not carried out [85646]. Furthermore, Northrop Grumman, the contractor responsible for maintaining and operating the network, did not alert city officials to the need for an upgrade before the rollover event [85646]. This lack of communication and oversight can be seen as accidental decisions or oversights that contributed to the failure of the network.
Capability (Incompetence/Accidental) development_incompetence, accidental (a) The software failure incident in New York City related to the GPS rollover was primarily due to development incompetence. The incident occurred because the city's technology managers were caught off guard and did not prepare for the calendar reset of the centralized Global Positioning System, leading to the crash of the wireless network used by city agencies [85646]. The report released by the city highlighted poor preparation, confusion, poor communication, and a lack of coordination that hampered the attempts to get the network running again after it went down [85646]. (b) Additionally, the incident can also be attributed to accidental factors. The report mentioned that officials at several city agencies, including the Police Department and the Office of Emergency Management, were aware of the GPS rollover, but the Department of Information Technology and Telecommunications claimed they were not aware of it before it occurred [85646]. This lack of awareness could be seen as an accidental oversight that contributed to the failure of the wireless network.
Duration temporary From the provided articles, the software failure incident related to the GPS rollover in New York City's wireless network (NYCWiN) was temporary. The incident occurred on April 6 when the network went dark due to the calendar reset of the centralized Global Positioning System [83649]. It took 10 days to get the network running again, indicating a temporary disruption [85646]. The failure was attributed to a lack of preparation for the rollover, confusion, poor communication, and a chaotic response from city officials [85646].
Behaviour crash, omission, timing, other (a) crash: The software failure incident in New York City related to the GPS rollover resulted in a crash of the wireless network used by city agencies, including the Police Department and the Department of Transportation. This crash led to the disruption of services such as the collection and transmission of information from license plate readers, programming of traffic lights, and communication between various city departments [83649, 85646]. (b) omission: The software failure incident also involved omission, as the wireless network omitted to perform its intended functions, causing service interruptions such as disabling city-operated signs showing arrival times at bus stops, online traffic cameras, and other tasks that were handled by the network [85646]. (c) timing: The failure was related to timing as well, as the system was not prepared for the calendar reset of the centralized Global Positioning System, leading to the network crash on the specific date of the GPS rollover. This timing issue resulted in the system losing its functionality at a critical moment [83649, 85646]. (d) value: The software failure incident did not specifically involve a failure related to the system performing its intended functions incorrectly [unknown]. (e) byzantine: The software failure incident did not exhibit behavior related to the system behaving erroneously with inconsistent responses and interactions [unknown]. (f) other: The software failure incident also involved poor communication, confusion, and a lack of coordination in the response to the network crash, which contributed to the delay in getting the system back up and running. Additionally, the incident highlighted issues with preparation, coordination, and communication among city agencies and officials [85646].

IoT System Layer

Layer Option Rationale
Perception sensor, network_communication (a) sensor: The software failure incident in New York City was related to the sensor layer of the cyber physical system. The failure was due to contributing factors introduced by sensor error, specifically the GPS rollover event that affected the centralized Global Positioning System. This event caused disruptions in various city services that relied on the GPS data, such as Police Department license plate readers and traffic lights control systems [83649, 85646]. (b) actuator: The software failure incident did not directly involve actuator errors. The focus of the incident was on the failure caused by sensor errors related to the GPS rollover event and its impact on the wireless network used by city agencies [83649, 85646]. (c) processing_unit: The software failure incident did not directly involve errors related to the processing unit. The primary cause of the failure was attributed to sensor errors resulting from the GPS rollover event and the lack of preparation for it by the city's technology managers [83649, 85646]. (d) network_communication: The software failure incident was related to network communication errors. The failure of the wireless network used by city agencies, including Police Department license plate readers and traffic lights control systems, was a result of the GPS rollover event and the lack of upgrades or preparations for it, leading to disruptions in communication and data transmission [83649, 85646]. (e) embedded_software: The software failure incident was not directly linked to errors in embedded software. The main issue stemmed from sensor errors caused by the GPS rollover event, which impacted the functioning of the wireless network and various city services reliant on GPS data [83649, 85646].
Communication connectivity_level The software failure incident in New York City related to the GPS rollover was more closely associated with the communication layer of the cyber physical system that failed, specifically at the connectivity level. The failure of the NYCWiN wireless network, used by city agencies, was due to the calendar reset of the centralized Global Positioning System (GPS), which disrupted various services relying on the network, such as Police Department license plate readers and traffic light control systems [83649, 85646]. The incident highlighted poor preparation, lack of awareness, and chaotic response from the Department of Information Technology and Telecommunications (DoiTT), which was responsible for operating the network [85646]. The failure was not attributed to issues at the physical layer but rather at the network or transport layer, affecting the connectivity and functionality of the system.
Application FALSE The software failure incident related to the GPS rollover in New York City, specifically the crash of the NYCWiN wireless network, was not explicitly attributed to the application layer of the cyber physical system. The articles primarily focus on the failure being a result of inadequate preparation for the GPS rollover, lack of upgrades, poor communication, and coordination issues rather than specific application layer failures [83649, 85646]. Therefore, it is unknown whether the failure was related to the application layer of the cyber physical system based on the information provided in the articles.

Other Details

Category Option Rationale
Consequence delay, non-human (a) death: People lost their lives due to the software failure - No information in the articles suggests that people lost their lives due to the software failure incident. [83649, 85646] (b) harm: People were physically harmed due to the software failure - There is no mention of people being physically harmed due to the software failure incident. [83649, 85646] (c) basic: People's access to food or shelter was impacted because of the software failure - The articles do not mention any impact on people's access to food or shelter due to the software failure incident. [83649, 85646] (d) property: People's material goods, money, or data was impacted due to the software failure - The software failure incident impacted various city services and functions, including the collection and transmission of information from Police Department license plate readers, programming of traffic lights by the Department of Transportation, and connectivity for agencies like sanitation and parks departments. This disruption could potentially impact data and operational efficiency but does not specifically mention any direct impact on people's material goods, money, or data. [83649, 85646] (e) delay: People had to postpone an activity due to the software failure - The software failure incident caused disruptions in various city services and functions, such as the collection and transmission of information from Police Department license plate readers, programming of traffic lights, and connectivity for agencies like sanitation and parks departments. These disruptions likely led to delays in operations and tasks that relied on the affected systems. [83649, 85646] (f) non-human: Non-human entities were impacted due to the software failure - The software failure incident affected the NYCWiN wireless network, Police Department license plate readers, traffic-signal controllers, Real Time Traffic Information webpage, city payroll system, and systems used by the departments of health, sanitation, and parks. These are all non-human entities or systems that were impacted by the failure. [83649, 85646] (g) no_consequence: There were no real observed consequences of the software failure - The software failure incident had observable consequences on various city services and functions, as detailed in the articles. Therefore, it does not fall under the category of no consequences. [83649, 85646] (h) theoretical_consequence: There were potential consequences discussed of the software failure that did not occur - The articles discuss potential consequences of the software failure incident, such as interruptions to city services, disabled city-operated signs, disabled traffic cameras, and the need for manual task performance. These potential consequences did occur as a result of the failure. [83649, 85646] (i) other: Was there consequence(s) of the software failure not described in the (a to h) options? What is the other consequence(s)? - The articles do not mention any other specific consequences of the software failure incident beyond the disruptions and impacts on various city services and systems. [83649, 85646]
Domain information, government (a) The failed system was intended to support the information industry, specifically in the context of production and distribution of information. The system, known as NYCWiN, was a wireless network built for the city by Northrop Grumman, aimed at facilitating communication and access to information among various city agencies [83649, 85646]. (l) The failed system was also related to the government industry, as it was utilized by city agencies in New York City for various functions such as collecting and transmitting information from Police Department license plate readers, programming traffic lights, and staying connected with different offices and work sites [83649, 85646]. (m) Additionally, the system failure incident was not directly related to any other industry mentioned in the options provided.

Sources

Back to List