Post

CrowdStrike: When the Falcon strikes down the crowds

Probably the biggest IT outage in the decade

Please note that the views and opinions expressed in this article are solely my own.

What happened?

On Friday, July 19, 2024, a major cybersecurity company, CrowdStrike, released a sensor update for their XDR agent named Falcon. This update triggered a global IT outage, affecting approximately 8.5 million Windows devices. The devices crashed and went into a Blue Screen of Death (BSOD) loop following the update. Critical services, such as telecommunications, banking, airlines, hospitals and major news networks were severely impacted. CrowdStrike’s CEO confirmed that the issue was not a security incident or cyberattack.

Who were affected?

Unfortunately, a lot of critical organizations, including airports, airlines1, hospitals, and some of the 911 hotline systems2, were down as well. Even though “only” 8.5 million devices were affected, which is less than 1% of devices running Microsoft3, a lot of people’s lives were affected too. Hospitals had to delay many surgeries, people couldn’t travel for a while, and many supermarket kiosks were down as well.

First response from CrowdStrike

“A few hours into the outage, companies that were taken down realized what had happened and demanded answers from the supplier, so CrowdStrike’s CEO announced that they are investigating the issue.

Desktop View

The first post didn’t achieve what it was meant to, at least based on the community’s reaction. But given the situation’s impact, their response was totally understandable at that time.

Desktop View Desktop View

Some people bought domains to make fun of the company, but…

Desktop View Desktop View

Threat Actors did that too, we’ll discuss about itlater.

CrowdStrike and even Microsoft published several official posts regarding the incident and the state of the recovery process.

But what caused the BSOD?

That’s a really good question. Based on the official post4, the issue was caused by faulty channel files, basically configuration files. Initially, they were mistaken for Windows kernel drivers due to their .sys extension.

But you might wonder, what are these used for?

Channel File 291 controls how Falcon evaluates named pipe5 execution on Windows systems. Named pipes are used for normal, interprocess or intersystem communication in Windows.

Basically, a channel file contains specific monitoring and response rules for the sensor. It tells the agent what counts as suspicious activities and how it should react to the identified threat by (e.g.) removing a malicious file, killing a process, or isolating the endpoint. These files can also contain settings regarding the communications between the XDR and the cloud-based management platform.

Some early reports suggested that the issue was caused by NULL bytes present in the channel files. Later, CrowdStrike clarified that it was caused by a logic error.

The solution…

After the engineers realized what went wrong, they figured out the first solution:

  • Boot Windows into Safe Mode or the Windows Recovery Environment
  • Navigate to the C:\Windows\System32\drivers\CrowdStrike directory
  • Locate the file matching ‘C-0000029*.sys’, and delete it.
  • Boot the host normally.

Of course, this solution is for the simplest scenarios, but there were hundreds of thousands of machines with BitLocker turned on or running at a cloud host provider.

Find the official remediation guides here: https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/

Threat Actors jumped right on the train

Unfortunately, attackers wanted to take a slice too. They immediately bought several misleading domains for malicious purposes, such as using them to deliver fake hotfixes and steal credentials. Any.Run has already found a malicious “hotfix”6 that delivers Remcos malware to the victim’s machine. I believe this will continue for a while.

Please be cautious out there and only use the official site for any kind of communication.

What happens after this?

Obviously, people are questioning how CrowdStrike didn’t notice this issue before releasing it to the public and they are demanding answers. In my opinion, this will impact supplier quality control in the future. Maybe governments will create new regulations to ensure this won’t happen again, but at the moment, we cannot be sure. Companies are still recovering, and most of them aren’t fully back yet.

Timeline

timeline
    title CrowdStrike BSOD Outages

    section Affected Systems
    2024-07-19 : BSODs start affecting Windows systems
    2024-07-20 : CrowdStrike identifies faulty update
    2024-07-21 : Fix deployed, but manual repairs ongoing

    section Impact
    2024-07-19 : Worldwide disruption - airlines, banks, hospitals
    2024-07-20 : Estimated financial damage in billions of dollars

    section Resolution
    2024-07-20 : CrowdStrike reverts update
    2024-07-21 : Ongoing recovery efforts

    section Notes
    2024-07-19 : Possibly the largest IT outage in history
This post is licensed under CC BY 4.0 by the author.