The day a faulty update crippled global IT systems – a lesson in incident response

Where were you on Friday 19th July 2024?  About to catch a flight to start the summer holidays? Waiting for a GP appointment? Doing a spot of shopping?  Or undertaking one of the other countless processes and everyday tasks that was supposed to be part of a normal day?  Unless you’ve been on a complete IT detox, you’ll have heard about CrowdStrike’s software update that was intended to protect against crashes and disruptions but did the opposite.

Our Head of Security Solutions, Stephanie Fox tells us what happened and the lessons that businesses globally, need to learn.

So what happened?

CrowdStrike, a US based cyber security firm, that provides cloud-based endpoint protection and threat intelligence services to a wide range of industries, release a sensor configuration update to its Falcon software which caused a malfunction in the other software that it interacts with, including Microsoft Windows.  This essentially disabled those systems, worldwide.  The updates are an ongoing part of the protection mechanisms of the Falcon platform but this time it triggered a logic error resulting in a system crash and blue screen (BSOD) on impacted systems and a cascade of failures across its infrastructure.

What was the impact?

The impact of the outage was significant, as some of their key customers included some of the largest and most sensitive organisations in the world, such as governments, banks, healthcare providers, and travel providers. According to CrowdStrike’s own estimates, the outage affected about 10% of its customer base, or roughly 1,500 organisations. Some of the reported consequences of the outage were:

  • Loss of visibility and protection against cyber threats, especially ransomware and phishing attacks.
  • Disruption of business operations and productivity, as some customers had to shut down their systems or switch to backup solutions.
  • Damage to reputation and trust.
  • Potential legal and regulatory implications, as some customers may have violated their compliance obligations or contractual agreements due to the outage.

How did CrowdStrike Respond?

The Company acknowledged the outage on its website and social media channels and provided regular updates on its status and resolution. The company also activated its incident response team and worked with its partners and vendors to restore its services as quickly as possible. According to CrowdStrike, the outage was fully resolved by Saturday 20th July, and the company conducted a root cause analysis and implemented corrective actions to prevent similar incidents in the future. CrowdStrike also apologised to its customers and offered them compensation and support.

Although the problem is now resolved, several days later, the world is still unravelling some of the issues caused such as disrupted flights, GP backlogs and banking systems.

What are the lessons learned?

Friday’s incident was deemed to be a once in decade incident but it is a reminder of the challenges of cybersecurity in the digital age and how reliant we are on an increasingly complex and connected global IT infrastructure. It highlighted some of the key lessons that must be considered when designing, implementing and maintaining a robust digital capability:

  • Resilient infrastructure and processes, as well as regular testing and monitoring of the system’s performance and security.
  • Clear, timely communication and coordination among all the stakeholders involved, including the service provider, the customers, and the third-party vendors.
  • Contingency and recovery plans, as well as backup and alternative solutions, in case of a service disruption or failure.
  • Transparency and accountability, as well as trust and collaboration, between the service provider and the customers, especially in the aftermath of an incident.

Thankfully, Prosperity 24/7 were not directly impacted by this outage and were on hand support our clients who were directly or indirectly affected with luckily, minimal impact.

Are you prepared?

Even though it was a rare incident, it does raise the question, what would you do if it happened to your organisation? Are you prepared and how would you respond?   Do you have an effective Incident Response Plan in place?

When a crisis strikes you need to know what to do next without hesitation.  Prosperity 24/7, via our partners at Arctic Wolf, can help you to achieve a level of comfort to act swiftly in these circumstances, by providing you with an effective Incident Management Response capability.

From proactive planning to containment and restoration, we can act fast to make sure you can get back on track.  77% of businesses don’t have an effective incident response plan in place.  Don’t be a statistic.

Discover more by speaking to the Prosperity 24/7 Security solutions team at Security.Solutions@prosperity247.com