Vendors should minimise use of kernel mode, customers should make full use of integrated Windows security features
In a blog post published on Friday, David Weston, vice president, enterprise and OS security at Microsoft, outlined the causes of the global incident which saw millions of devices running CrowdStrike Falcon on Windows crash.
According to Microsoft's kernel-crash dump analyses, the root cause of the outage was a memory safety issue, specifically a read out-of-bounds access violation in CrowdStrike's CSagent.sys driver, a module designed to detect suspicious activity.
CSagent.sys is a file system filter driver operating at kernel level that receives notifications about file operations, such as creation or modification.
File system filters are also used to detect when security solutions are attempting to monitor the behaviour of the system.
The fateful CrowdStrike update contained a change to the sensor that allowed the filter driver to be called when file-based activity was detected. This was intended to enhance malware detection capabilities. However the faulty update cause it to try to access a memory address for which it did not have permissions, and Falcon was unable to handle the error gracefully, causing the Windows kernel to crash and enter a bootloop.
Microsoft's analysis of the cause of the crash concurs with CrowdStrike's preliminary review, released last week.
CrowdStrike Falcon loads four modules into the Windows kernel. A question on the lips of many security professions and admins tasked with cleaning up the mess, is why that is necessary at all. Why could they not run in user mode, where any glitches would be far less damaging?
They also wondered whether Microsoft's security checks might be at fault.
The main reasons for running drivers in kernel mode are twofold, according to Weston.
First, it allows security vendors to monitor what's happening in the core of the operating system itself: "Kernel drivers allow for system wide visibility, and the capability to load in early boot to detect threats like boot kits and root kits which can load before user-mode applications," he wrote.
Second, it provides for tamper resistance: "Security products want to ensure that their software cannot be disabled by malware, targeted attacks, or malicious insiders, even when those attackers have admin-level privileges."
CrowdStrike has taken full responsibility for the error, which took down more than 8.5 million Windows devices, saying it was due to a bug in its Content Validator, the system that is supposed to detect faulty code in an update.
However, the incident does put question mark over whether operating systems that allow third-party software to run in kernel mode should have more stringent checks in place.
Microsoft's blog does not address this directly. It says it engages with third-party security vendors through the Microsoft Virus Initiative (MVI) to share data and best practices, and that it provides runtime protection, such as Patch Guard, to prevent disruptive behaviour from kernel drivers.
In addition, drivers must pass a series of tests by Microsoft Windows Hardware Quality Labs (WHQL) to be certified - although this does not cover updates.
In his blog post, Weston advised security software vendors to minimise their use of sensors in kernel mode for data collection and enforcement, and to isolate the majority of key product functionality in user mode, where additional protections such as Virtualisation-based Security (VBS) Enclaves and Protected Processes and Event Tracing for Windows (ETW) are available.
Customers are advised to make use of security features integrated into Windows.
"Windows is constantly increasing security defaults, including dozens of new security features enabled by default in Windows 11," Weston wrote.
He added that Microsoft plans to work with the third-party vendors security software vendors to help them take advantage of these integrated features.
CrowdStrike is used predominantly by large corporations and public sector organisations. It is estimated that the global outage CrowdStrike outage will cost Fortune 500 companies around $44 million each, on average.
However, a CISO told Computing that the company is "the best at what it does," and predicted its long-term survival, in spite of causing the largest IT outage in history.
Recommended Comments
There are no comments to display.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.