Jump to content
  • This Machine Exposes Privacy Violations

    aum

    • 260 views
    • 15 minutes
     Share


    • 260 views
    • 15 minutes

    A former Google engineer has built a search engine, webXray, that aims to find illicit online data collection and tracking—with the goal of becoming “the Henry Ford of tech lawsuits.”

     

    “It’s not a level playing field,” says Tim Libert, becoming animated as he shifts in his seat in his sparse home office in Sunnyvale, glancing between hulking monitors and clicking around on his desktop. “In fact it’s the furthest fucking thing from a level playing field.”

     

    The thing that is agitating Libert is the same thing that has agitated him for over a decade, when, in 2012, as a grad student at the University of Pennsylvania, he began researching the ways the web tracks us. Every day, the companies that operate our most expansive and vital web infrastructure—Google, Microsoft, Facebook—track our browsing habits and gather extensive troves of data on us, based on what we search for and which pages we visit. And we, the ordinary internet users, have little idea which websites are collecting what data, and then sending it upstream to the likes of Google.

     

    When you search for where to get an abortion, is sensitive data being tracked and collected? Unfortunately, very possibly so. Is an addiction treatment page or trans porn site exposing your IP address? Quite likely. Countless websites (truly countless—the scope, as we shall see, is nearly incomprehensible) are shipping private data about your web activity directly to the tech giants’ doorsteps. Thanks in part to the efforts of privacy researchers like Libert, we know this already, have known we’re being tracked for years—yet we lack knowledge of the specifics, and we lack agency, so this sea of privacy violations becomes another Bad Thing that happens on an internet teeming with them.

     

    A lot of this leaking data is not just potentially embarrassing, or perhaps harmful to career prospects if it were to be made public, but outright illegal. Over the past half-decade, the European Union, a number of US states, and other governments around the world have enacted laws that restrict what kind of data websites can collect, or require a company to receive consent from a user before it does so. Every day, tech companies may violate those laws when, say, search engines and medical websites trample HIPAA by allowing search logs of users’ ailments to be tracked, documented, and sometimes monetized by companies like Google, or running roughshod over consent rules by turning a blind eye to advertising cookies embedded in publishers’ websites.

     

    This, Libert says, is why he developed webXray, a crude prototype of which he’s demoing for me right now. It’s a search engine for rooting out specific privacy violations anywhere on the web. By searching for a specific term or website, you can use webXray to see which sites are tracking you, and where all that data goes. Its mission, he says, is simple; “I want to give privacy enforcers equal technology as privacy violators.” To level the playing field.

     

    On Wednesday, Libert plans to launch the website to the public, so anyone can get a sense of how sprawling the web of privacy violations being made every day really is, along with a premium tier for regulators and attorneys, who can use the tool to assess those violations and address them. Libert knows a thing or two about both search engines and digital privacy. Until last year, he was a staff engineer on the privacy team at Google, which is of course the operator of the largest search engine in the world—and the largest collector of data of the billions of people who use it.

     

    “You don’t want to be the person who broke the money machine.”


    - Tim Libert, former Google engineer and creator of webXray

     

    Libert had the idea for webXray while he was still a grad student, researching how websites track their users and transmit the bounty to tech giants, data brokers like Experian, and dozens of other third parties. Thinking about the architecture and adtech of the web in the 2010s one day, he says he scribbled out on a napkin a diagram for a tool that would expose these otherwise hidden data chains.

     

    That’s around when I first encountered Libert’s work, too: In 2015, I wrote about research he published using an early framework for webXray to determine that major medical websites like CDC.gov and WebMD.com were sharing data about the pages you visited—including sensitive health conditions and diseases—with dozens of third parties, and in a way that made it easy for them to identify you.

     

    After finishing his PhD and a postdoc at Oxford, Libert landed at Carnegie Mellon, where he continued his research into a web where privacy continued to erode, publishing findings on the “prevalence of third-party tracking on Covid-19-related web pages” and the “widespread sexual data leakage'' on porn sites. He became an outspoken advocate for online privacy, penning op-eds in The New York Times, The Guardian, and The Conversation.

     

    In 2021, he took a job at Google, the company he had spent much of his professional life scrutinizing, even criticizing. (A Google spokesperson asserts to WIRED that the company takes user privacy quite seriously.) He had reservations about the job, but ultimately reasoned that if he wanted to move the needle, he could make more of an impact improving users’ privacy from the inside. The six-figure salary didn’t hurt, either. “I said from the beginning that I’d give myself two years,” Libert says, “and if it wasn’t working, if I wasn’t getting anywhere, I’d get out before the golden handcuffs got me.”

     

    Libert was hired as a staff engineer on Google’s privacy team, a specialist in cookies, the small bits of data that are created to ID you when you visit a new website. Examining cookies and how they’re used to track user activity was a cornerstone of his research. Libert says he can’t speculate as to why Google hired him, but it seemed a good sign that the tech giant was interested in addressing the privacy concerns he’d raised over the years. Yet making progress proved to be challenging. For one thing, the scale of Google’s systems was even larger than he’d anticipated.

     

    “It’s not possible for any one person to understand how all these things work. It’s truly mind-boggling,” Libert says. “I had access to run database queries on an unimaginably huge number of cookies, and I initially came in, and I was thinking, ‘Oh it’s going to be a Wizard of Oz thing, I’m just gonna find the person who knows absolutely everything, and they’re going to give me the information.’” That didn’t happen because, it turned out, that person didn’t exist. “I came in because there wasn’t really anybody who could answer directly a lot of questions I had as a researcher,” Libert says. “And what I learned was more about the cultural, sociological aspects of these companies and how they mesh with the actual technology. Part of that is it’s so complicated, and it’s hard to understand everything at a high level—it’s not that people aren’t trying to, it’s just that it’s like staring at the sun.”

     

    After settling in, Libert says his time essentially came to be divided between two tasks. The first was working with rank-and-file engineers who were trying to improve privacy features on Google’s products. “The other half of my time was trying to convince executives to change things,” he says. “And the problem with changing things at Google is like any Innovator’s Dilemma.” At the heart of the matter was that the privacy landscape was changing fast. When he started as a researcher, there were few good digital privacy laws. Today, Libert estimates that the majority of web users are protected by at least some online privacy laws, with more going into effect all the time. Yet he believes Google was slow to address them. “The problems I was encountering are exactly the same types of problems of ‘Why did OpenAI catch Google by surprise?’” Libert says. “When you get that bureaucratic and that big and there’s that much money involved, the number one thing to do is not change anything. You don’t want to be the person who broke the money machine.”

     

    “So I would spend my time between these people who want to do the right thing,” Libert says, “and these other groups of people, some who had been there for 20 years, and were just sitting on hoards of personal wealth, and they don’t want to change anything. But the world has changed, and what I kept trying to tell leadership is, ‘Look, the world is different. Whether or not Google wants to change, it doesn’t matter, we’re going to have to change.’”

     

    Libert says he can’t go into specifics about his disputes with Google’s leadership due to an employment agreement, but he believes his entreaties repeatedly fell on deaf ears. His own research had shown, even before he took the job at Google, that Google used cookies to collect data for all kinds of users’ queries, and that it used cookies to track those users extensively. Now, in places like Germany and California, a lot of the data collected via cookies is illegal if done without the user’s explicit consent. And yet. “Cookies are so integral to how the company makes money, no one had the courage to say, ‘Oh wow, the world’s changing, we need to adjust,’” Libert says.

     

    Google spokesperson Matt Bryant tells WIRED in a statement that assertions that the company disregards privacy are incorrect.

     

    “Respecting user privacy is our top priority, and to claim otherwise is wrong,” Bryant says.

     

    Regardless, one day, Libert’s exasperation boiled over. “I was just trying to contain my frustration with a straight face while I asked the same people the same question for the 100th time, and not getting taken seriously,” Libert says. “The week I decided to quit, a blood vessel burst in my eye because I was trying to restrain my frustration as I was having a conversation with a lawyer who I disagreed with.”

    Libert left Google after almost exactly two years.

     

    “I wanna be the Henry Ford of tech lawsuits—turn this into a factory assembly line.”


    - Tim Libert

     

    After quitting, Libert returned his energies to pressuring the tech giants from the outside. He decided to turn webXray, the tool that he’d used to power his research for years, into a public-facing system that helps ordinary users understand the vast scope of the problem, and to allow activists, regulators, and lawyers to document legal violations in order to challenge them. And, potentially, to cost companies like Google billions in legal fines and violations.

     

    Here’s how webXray works: Basically, you can either search for a term—"pregnancy" or "STD" or "furry porn" or whatever—or a specific website to get a snapshot of all the websites connected to that term that are shipping your data, and search queries, connected to your IP address, to Google, advertisers, and third-party data brokers. To nod to a famous example, if you're pregnant but haven't told anyone, and yet preroll digital ads are showing you pregnancy-related commercials around the web, you can use webXray to check the websites you may have visited that are siphoning that data directly to Google, show when your IP address is harvested by one of Google's advertising services, and see how tech co's build these data profiles of you in real time.

     

    “WebXray can check every cookie that comes into Google for consent,” Libert says.

     

    Most web users likely don't realize how large a data profile these companies are creating by tracking their online activity—searching for or visiting websites related to information about sexual identity, health conditions, things like addiction treatment; all that gets hoovered up.

     

    Libert estimates that every day, there are likely trillions of illicitly transmitted cookies on the web. Many cookies are legal and innocuous; many users have explicitly agreed to sending companies their data—it’s the vast, nearly incomprehensible stream of those that are collected without a users’ knowledge or consent that run afoul of the law. And part of the problem is that massive companies like Microsoft, Meta, and Google may find it easier and more cost effective to ignore potential privacy violations than proactively work to address them, and just eat the occasional fine when those alleged violations are discovered instead.

     

    This is, incidentally, how he plans to fund the operation—the basic version of webXray will be available to all, but Libert will offer a specialized tier for litigators, regulators, and businesses looking to keep their digital presences compliant with the law. He will also offer consulting services and serve as an expert witness in lawsuits.

     

    I gave the keys to the site to digital rights activist Cory Doctorow, who took a quick look under the hood, and gave the idea a thumbs up. “I think the way to go here is class action,” Doctorow says, noting that this could lead to a trove of class action lawsuits against big tech companies. “So long as this is just exposing the API calls that produces evidence that Google is getting data that it doesn’t have lawful consent to receive or hold, this is the right move. I think it’s really a smoking gun,” he says.

     

    Libert, for his part, concurs. “Yeah, I wanna be the Henry Ford of tech lawsuits—turn this into a factory assembly line.”

     

    He’s already started. Three months after leaving Google, Libert served as an expert witness in a trial, testifying that websites were allegedly leaking data in violation of the law—against Google. His former employer tried to have him disqualified, arguing, somewhat ironically, that he knew too much. On Google’s policy and internal standards team, the company’s court records say, “Dr. Libert became the go-to person for all things related to cookies.” (On Monday, a judge dismissed that lawsuit, pending appeal.)

     

    “When I did that first lawsuit, and used webXray for that, they lost it,” Libert says of Google’s reaction. “When you look at those legal filings, there’s one thing that’s driving that—fear. They’re afraid of this data being available, because they know it affects the bottom line. And it scares them.”

     

    “One of the tragedies of Google is they used to lead by example in a positive way, and I think especially in the past three to five years, they’re not leading by positive example, they’re systematically leading by negative example,” Libert says. “And I think that’s burning down the web—the most powerful company doing things like recommending you put glue on your pizza. It’s not just that a website is doing that, it’s that the website, the advertising platform is doing that, and that was part of my frustration.”

     

    Google of course disagrees with this characterization of its tools and operations. “We design and build our products with strong security and privacy protections, including easy-to-use controls for managing and deleting data,” Bryant, the company spokesperson, says. “When it comes to advertising, Google was the first company to build a tool that lets people see and adjust their ads settings and even opt out of personalized ads entirely.”

     

    Despite Libert’s gloomy view of the current state of online privacy, he is actually an optimist. He believes webXray will help speed up a shift to a better, more private, more secure web—the path to which Google and the other tech giants are currently blocking. And it’s no coincidence, perhaps, that there’s been an exodus from Google’s privacy teams in the last few months: The announcement of Keith Enright, Google’s privacy chief, exiting the company came in June, and the position “will not be replaced.” Libert says his colleagues are getting fired en masse. To Libert, it seems that Google is deprioritizing privacy at the very moment when users are calling for stronger policies.

     

    “The problem we had 10 to 15 years ago is that there weren’t any laws. Now lots of countries have passed laws—the vast majority of people on the planet are protected by data privacy laws, but enforcement hasn’t caught up,” he says. “It’s going to catch up. I think we can speed it up.” Because people want privacy; it’s that simple. It’s why he imagines law offices, government offices, and businesses turning to his new search engine to help root out the scourge of privacy violations across the web.

     

    It’s why, perhaps, webXray’s tagline is simple and idealistic: “Privacy is inevitable.”

     

    I guess we’ll find out.

     

    Updated 7/24/2024, 1:50 pm: Clarified the launch date of webXray, which was officially released publicly on Wednesday.

     

    Source


    User Feedback

    Recommended Comments

    There are no comments to display.



    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Add a comment...

    ×   Pasted as rich text.   Paste as plain text instead

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...