Jump to content
  • How One Bad CrowdStrike Update Crashed the World’s Computers


    Karlston

    • 363 views
    • 9 minutes
     Share


    • 363 views
    • 9 minutes

    A defective CrowdStrike kernel driver sent computers around the globe into a reboot death spiral, taking down air travel, hospitals, banks, and more with it. Here’s how that’s possible.

    Only a handful of times in history has a single piece of code managed to instantly wreck computer systems worldwide. The Slammer worm of 2003. Russia’s Ukraine-targeted NotPetya cyberattack. North Korea’s self-spreading ransomware WannaCry. But the ongoing digital catastrophe that rocked the internet and IT infrastructure around the globe over the past 12 hours appears to have been triggered not by malicious code released by hackers, but by the software designed to stop them.

     

    Two internet infrastructure disasters collided on Friday to produce disruptions around the world in airports, train systems, banks, health care organizations, hotels, television stations, and more. On Thursday night, Microsoft’s cloud platform Azure experienced a widespread outage. By Friday morning, the situation turned into a perfect storm when the security firm CrowdStrike released a flawed software update that sent Windows computers into a catastrophic reboot spiral. A Microsoft spokesperson tells WIRED that the two IT failures are unrelated.

     

    The cause of one of those two disasters, at least, has become clear: buggy code pushed out as an update to CrowdStrike’s Falcon monitoring product, essentially an antivirus platform that runs with deep system access on “endpoints” like laptops, servers, and routers to detect malware and suspicious activity that could indicate compromise. Falcon requires permission to update itself automatically and regularly, since CrowdStrike is constantly adding detections to the system to defend against new and evolving threats. The downside of this arrangement, though, is the risk that this system, which is meant to enhance security and stability, could end up undermining it instead.

     

    “It's the biggest case in history. We’ve never had a worldwide workstation outage like this,” says Mikko Hyppönen, the chief research officer at cybersecurity company WithSecure. Around a decade ago, Hyppönen says, widespread outages were more common due to the spread of worms or trojans. More recently, global outages have happened on the “server side” of systems, meaning outages often stem from cloud providers such as Amazon’s Web Services, internet cable cuts, or authentication and DNS issues.

     

    CrowdStrike CEO George Kurtz said on Friday that the issues were caused by a “defect” in code the company released for Windows. Mac and Linux systems were not affected. “The issue has been identified, isolated and a fix has been deployed,” Kurtz said in a statement, adding the problems were not the result of a cyberattack. In an interview with NBC, Kurtz apologized for the disruption and said it may take some time for things to be back to normal.

     

     

    The widespread Windows outages have been linked to a software update from cybersecurity giant CrowdStrike. It is believed the issues are not linked to a malicious cyberattack, cybersecurity officials say, but rather stem from a misconfigured/corrupted update that CrowdStrike pushed out to its customers.

     

    Security and IT analysts searching for the root cause of the gargantuan outage say that it appears to be related to a “kernel driver” update to CrowdStrike’s Falcon software. Kernel drivers are the software components that allow applications to interact with Windows at its deepest level, the core of the operating system known as its kernel. That highly sensitive level of access is necessary for security software, so that it can run prior to any malicious software installed on the system and access any part of the system where hackers might seek to plant their code. As malware has improved and evolved, it has pushed defense software to require constant connection and more extensive control.

     

    That deeper access also introduces a far higher possibility that security software—and updates to that software—will crash the whole system, says Matthieu Suiche, head of detection engineering at the security firm Magnet Forensics. He compares running malicious code detection software at the kernel level of an operating system to “open-heart surgery.”

     

    Yet it’s nonetheless surprising that a kernel driver update would be able to cause such a massive global computer crash, says Costin Raiu, who worked at Russian security software firm Kaspersky for 23 years and led its threat intelligence team before leaving the company last year. During his years at Kaspersky, he says, driver updates for Windows software were closely scrutinized and tested for weeks before they were pushed out.

     

    More importantly, they require that Microsoft also vet the code and cryptographically sign it, suggesting that Microsoft, too, may well have missed whatever bug in CrowdStrike’s Falcon driver triggered this outage. “It’s surprising that with the extreme attention paid to driver updates, this still happened,” says Raiu. “One simple driver can bring down everything. Which is what we saw here.”

     

    A Microsoft spokesperson told WIRED that the “CrowdStrike update was responsible for bringing down a number of IT systems globally,” and added that “Microsoft does not have oversight into updates that CrowdStrike makes in its systems,” without further explanation of whether Microsoft does in fact inspect and sign kernel driver updates.

     

    Raiu adds that even so, CrowdStrike is far from the only security firm to trigger Windows crashes with a driver update. Updates to Kaspersky and even Windows’ own built-in antivirus software Windows Defender have caused similar Blue Screen of Death crashes in years past, he notes. “Every security solution on the planet has had their CrowdStrike moments,” Raiu says. “This is nothing new but the scale of the event.”

     

    Cybersecurity authorities around the world have issued alerts about the disruption, but have similarly been quick to rule out any nefarious activity by hackers. “The NCSC assesses that these have not been caused by malicious cyber attacks,” Felicity Oswald, CEO of the UK’s National Cyber Security Center, said. Officials in Australia have come to the same conclusion.

     

    Nevertheless, the impact has been sweeping and dramatic. Around the world, the outages have been spiraling as companies, public bodies, and IT teams race to fix bricked machines, which involves manually taking machines through a series of corrective steps, including rebooting. In the UK, Israel, and Germany, health care services and hospitals saw systems that they use to communicate with patients disrupted, and canceled some appointments. Emergency services in the US using 911 have reportedly had problems with their lines too. In the earliest hours of the outages, some TV stations, including Sky News in the UK, stopped live news broadcasts.

     

    Global air travel has been one of the most impacted sectors so far. Huge lines formed at airports around the world, with one airport in India using handwritten boarding passes. In the US, Delta, United, and American Airlines grounded all flights at least temporarily, with a dramatic graphic showing air traffic plummeting above the US.

     

    The catastrophic situation reflects the fragility and deep interconnectedness of the internet. Numerous security practitioners told WIRED that they anticipated or even worked with clients to attempt to protect against a scenario where defense software itself caused cascading failures as a result of malicious exploitation or human error, as is the case with CrowdStrike. “This is an incredibly powerful illustration of our global digital vulnerabilities and the fragility of core internet infrastructure,” says Ciaran Martin, a professor at the University of Oxford and the former head of the UK’s National Cyber Security Center.

     

    The ability of one update to trigger such massive disruption still puzzles Raiu. According to Gartner, a market research firm, CrowdStrike accounts for 14 percent of the security software market by revenue, meaning its software is on a wide array of systems. Raiu suggests that the Falcon update must have triggered crashes at cloud providers such as Azure and Amazon Web Services, which vastly multiplied the disaster. “CrowdStrike is big, but it can’t be this big,” Raiu says. “Airports, critical infrastructure, hospitals. It cannot be just CrowdStrike everywhere. I suspect we’re seeing a combination of factors, a cascading effect, a chain reaction.”

     

    Hyppönen, from WithSecure, says his “guess” is that the issues may have happened due to “human error” in the update process. “An engineer at CrowdStrike is having a really bad day,” he says. Hyppönen suggests that CrowdStrike could have shipped software different to what they had been testing or mixed up files, or there could’ve been a combination of different factors. “Software like this has to go through extensive testing,” Hyppönen says. “That's what we do. That's what CrowdStrike, of course, does. You have to be really careful about what you ship, which is tough to do because security software is updated very frequently.”

     

    While many of the impacts of the outage are ongoing and still unraveling, the nature of the problem means that individually impacted machines may need to be rebooted manually rather than through an automated process. “It could be some time for some systems that just automatically won’t recover,” CrowdStrike CEO Kurtz told NBC.

     

    The company’s initial “workaround” guidance for dealing with the incident says Windows machines should be booted in a safe mode, a specific file should be deleted, and then rebooted. “The fixes we’ve seen so far mean that you have to physically go to every machine, which will take days, because it’s millions of machines around the world which are having the problem right now,” says Hyppönen from WithSecure.

     

    As system administrators race to contain the fallout, the larger existential question of how to prevent another, similar crisis looms large.

     

    “People may now demand changes in this operating model,” says Jake Williams, vice president of research and development at the cybersecurity consultancy Hunter Strategy. “For better or worse, CrowdStrike has just shown why pushing updates without IT intervention is unsustainable.”

     

    Update 7/19/2024, 11 am ET: Added comment from Microsoft saying that the Azure outage and the CrowdStrike kernel driver issue are unrelated.

    Update 7/19/2024, 12:30pm ET: Added further comment from Microsoft about its lack of oversight of CrowdStrike's updates.

     

    Source

     

    Hope you enjoyed this news post.

    Thank you for appreciating my time and effort posting news every single day for many years.

    2023: Over 5,800 news posts | 2024 (till end of June): 2,839 news posts

    • Like 2

    User Feedback

    Recommended Comments

    There are no comments to display.



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Paste as plain text instead

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...