Patch Management and Governance: Decompressing After CrowdStrike
Reflecting on the CrowdStrike incident, it’s clear that our journey through remediation taught us some tough lessons. It wasn't just about fixing a problem—it was about understanding the deeper vulnerabilities in our systems and the steps we need to take to protect our operations in the future. Now, as we move forward, it’s crucial to think about how we can prevent something like this from happening again. In this blog, we’ll explore how strategic delayed patching can prevent potential downtime due to bad patches while also highlighting the role of effective cybersecurity governance in this process.
Understanding the Complexity of OT Environments
If you've ever been responsible for an OT environment, you know that the business has one main expectation: keep everything running smoothly. When operations stop, you’re not just dealing with a technical issue—you’re facing lost revenue and possibly safety concerns. OT environments are intricate and demand continuous operation, unlike IT systems where downtime for updates is more manageable. In OT, the stakes are higher, and every decision can have significant consequences.
Challenges in Patching OT Systems
- Legacy Systems: Many of us work with legacy systems that weren’t built with modern patching processes in mind. It’s like trying to upgrade a vintage car with today’s technology—tricky and full of unknowns.
- Continuous Operation: These systems often need to run 24/7, leaving little room for maintenance and patching.
- Interdependencies: OT environments are like a tightly woven web, where a single patch can affect multiple systems in unexpected ways.
The CrowdStrike Incident: Lessons Learned
When the CrowdStrike incident first hit, it felt like we were dealing with a cyberattack. But as we dug deeper, we realized it wasn’t an external threat—it was a bad patch. This incident brought operations to a halt, not because of a hacker, but because of a simple update that had gone wrong. This experience was a wake-up call, showing the risks tied to patch management and the importance of a strategic, well-thought-out approach, especially in OT environments. What does the impact of a bad patch cause?
- Operational Disruptions: A bad patch can cause systems to malfunction, leading to unplanned downtime. An example of this is how crippled our airlines were during the Crowdstrike event.
- Safety Risks: In industries like energy or chemical manufacturing, system failures due to bad patches can pose serious safety hazards. While it’s unlikely that a bad patch can cause an incident like a hack or breach, there is still a risk of human injury. For example, someone may shut down a portion of a nuclear plant where an explosion could occur, or there are too many chemicals added to food or water. Other dangers could be shutting down devices running life support systems, fire suppression systems, or phone routing to 911 calls.
- Financial Losses: Downtime in OT environments can result in significant financial losses due to halted production and emergency repairs. This happens bi-directionally. This can be a financial loss to your organization and a financial loss to the patch provider due to liability, resulting in potential lawsuits.
Strategic Delayed Patching: A Balanced Approach to Endpoint Management
Delayed patching, when done strategically, can help avoid the risks associated with bad patches. This approach involves thorough testing and careful planning to ensure that patches do not disrupt critical operations. However, we also know that waiting too long can leave us vulnerable to new threats. Delayed patching must be part of a broader cyber strategy that includes strong communication across all teams—so that if something does go wrong, we’re ready to respond quickly. Some best practices to this approach would be:
- Comprehensive Testing
- Test patches in a controlled environment that mirrors the live OT system as closely as possible.
- Evaluate the impact of patches on system performance and stability before deployment.
- Risk Assessment and Prioritization
- Assess the risks associated with both applying and delaying patches.
- Prioritize patches based on the criticality of the vulnerabilities they address and the potential impact on operations.
- Scheduled Maintenance Windows
- Plan patch deployments during scheduled maintenance windows to minimize operational disruptions.
- Coordinate with all relevant departments to ensure minimal impact on production schedules.
- Backup and Recovery Plans
- Ensure that comprehensive backup and recovery plans are in place before applying patches.
- Be prepared to roll back changes quickly if a patch causes unexpected issues.
- Stakeholder Communication and Involvement
- Keep all stakeholders, including OT operators, IT security teams, and management, informed about patching plans and potential impacts.
- Foster a collaborative approach to patch management, ensuring that all parties understand the importance of both security and operational stability.
The Role of Cybersecurity Governance
Effective cybersecurity governance is crucial in managing the balance between timely patching and operational stability. It’s about setting clear policies, procedures, and roles to ensure that patch management is both strategic and efficient. Some of the key elements of cybersecurity governance in OT environments are:
- Policy Development
- Develop clear policies for patch management that consider the unique needs of OT environments.
- Regularly review and update policies to reflect evolving threats and technological advancements.
- Continuous Monitoring
- Implement continuous monitoring to detect potential vulnerabilities and assess the effectiveness of existing security measures.
- Use monitoring tools to identify anomalies that could indicate issues with applied patches.
- Incident Response Planning
- Develop and maintain robust incident response plans tailored to OT environments.
- Conduct regular drills to ensure readiness for potential incidents related to patch management.
- Leadership and Accountability
- Ensure senior leadership is involved in cybersecurity governance and understands the importance of strategic patch management.
- Assign clear roles and responsibilities for patch management and incident response.
Balancing security with operational requirements is essential for protecting critical OT systems. By integrating strategic delayed patching into a well-governed cybersecurity framework, organizations can minimize risks and ensure the continuity of essential processes.
If your business needs a partner to help plan or assist in your patch management process, or you would like a cyber assessment to help discover your current cyber health status, Interstates can help. Our seasoned OT experts have been battle-tested in the demanding landscapes of ICS (Industrial Control Systems) and OT environments
For expert insights on OT security and governance, watch Interstates’ Alan Raveling's presentation from the recent S4 conference, “Govern The Ungovernable - NIST CSF Govern Function.”