Testing suggests Google’s AI Overviews tell millions of lies per hour

Is 90 percent accuracy good enough for a search robot?

Looking up information on Google today means confronting AI Overviews, the Gemini-powered search robot that appears at the top of the results page. AI Overviews has had a rough time since its 2024 launch, attracting user ire over its scattershot accuracy, but it’s getting better and usually provides the right answer. That’s a low bar, though. A new analysis from The New York Times attempted to assess the accuracy of AI Overviews, finding it’s right 90 percent of the time. The flip side is that 1 in 10 AI answers is wrong, and for Google, that means hundreds of thousands of lies going out every minute of the day.

The Times conducted this analysis with the help of a startup called Oumi, which itself is deeply involved in developing AI models. The company used AI tools to probe AI Overviews with the SimpleQA evaluation, a common test to rank the factuality of generative models like Gemini. Released by OpenAI in 2024, SimpleQA is essentially a list of more than 4,000 questions with verifiable answers that can be fed into an AI.

Oumi began running its test last year when Gemini 2.5 was still the company’s best model. At the time, the benchmark showed an 85 percent accuracy rate. When the test was rerun following the Gemini 3 update, AI Overviews answered 91 percent of the questions correctly. If you extrapolate this miss rate out to all Google searches, AI Overviews is generating tens of millions of incorrect answers per day.

The report includes several examples of where AI Overviews went wrong. When asked for the date on which Bob Marley’s former home became a museum, AI Overviews cited three pages, two of which didn’t discuss the date at all. The final one, Wikipedia, listed two contradictory years, and AI Overviews confidently chose the wrong one. The benchmark also prompts models to produce the date on which Yo Yo Ma was inducted into the classical music hall of fame. While AI Overviews cited the organization’s website that listed Ma’s induction, it claimed there’s no such thing as the Classical Music Hall of Fame.

Google doesn’t much like this test. Google spokesperson Ned Adriance tells the Times that Google believes SimpleQA contains incorrect information. Its model evaluations often rely on a similar test called SimpleQA Verified, which uses a smaller set of questions that have been more thoroughly vetted. “This study has serious holes,” Adriance told the Times. “It doesn’t reflect what people are actually searching on Google.”

Benchmark problems

Evaluating new AI models sometimes feels more like art than science, which is part of the problem. Every company has its own preferred way of demonstrating what a model can do, and the non-deterministic nature of gen AI can make it hard to verify anything. These robots can get a factual question right and then completely miss it if you rerun the query immediately. Oumi even uses AI tools to run its assessments, and those models can hallucinate, too.

The other wrinkle is that AI Overviews isn’t a single monolithic model. Google told Ars Technica that it uses the “right model” for each query. While AI Overviews would get the best answers from always running Gemini 3.1 Pro, that’s slow and expensive. To load things promptly on a search page, the overview uses faster Gemini Flash models when possible (which appears to be most of the time).

Google’s response to this report is telling. In the realm of AI factuality, 9 out of 10 isn’t even that bad. Google has recently published benchmarks for new model releases featuring measurements of factuality in the range of 60 to 80 percent—these tests are run without tools like web search. Grounding an AI with more data, like the wealth of human knowledge on the Internet, does make it more accurate than the naked model itself. However, the truth is in the blue links somewhere, and AI Overviews encourages people to accept its sometimes inaccurate summaries instead of checking those sources manually.

While Google says the Times’ results don’t match what people see, you have to wonder how the company could even know that. You’ve probably seen mistakes in AI Overviews—we all have because that’s just how generative AI works. As Google itself reminds you at the bottom of every overview: “AI can make mistakes, so double-check responses.”

Source

Hope you enjoyed this news post. Feedback welcome.

Posted Wednesday 8 April 2026 at 5:24 am AEST (my time).

News posts: 2023 5,800+ | 2024 5,700+ | 2025 5,700+ | 2026 (to end of March) 1,297

RIP Matrix

Mutton and DKT27
1
1

User Feedback

1 Comment

Recommended Comments

Mutton 445

Posted April 7 (edited)

- Share this comment

Quote

You’ve probably seen mistakes in AI Overviews

Yes, yes I have. I've quickly learned to completely ignore the AI answer on gurgle because, when I have looked at it, it's been blatantly either wrong or ~~irreverential~~ irrelevant (darn AI auto-correct 🤣) so many times.

I was on a service provider's site a few weeks back, cancelling my account with them. I couldn't for the life of me find a way to turn something off on my control panel, so I went on the chat help. AI, of course... it gave me the answer. It was so wrong it wasn't even funny. It got my question correctly, but directed me to menu options that did not even exist, not even close. I told it so, very directly, and it apologised and tried again. That time it only made one mistake but it was close enough that I found the (well hidden) option.

So far I'm not impressed. I'm sure AI as amazing at some things*, but without human intervention to check the sanity of the results*2, it's untrustworthy at best and potentially outright dangerous at the moment.

*Some years back I had a PCB full of FPGAs that were programmed with an early AI-based image recognition system. That was impressive and effective at what it did (intrusive surveillance of ordinary people going about their business not realising they were being monitored by AI)

Edit *2: Unfortunately, so many Internet denizens seem to leave their sanity and common sense in the drawer when they turn on their PCs, and swallow whatever they read as if it's gospel.

Edited April 7 by Mutton

Karlston
1

Quote

Link to comment

Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Add a comment...

× Pasted as rich text. Paste as plain text instead

Only 75 emoji are allowed.

× Your link has been automatically embedded. Display as a link instead

× Your previous content has been restored. Clear editor

× You cannot paste images directly. Upload or insert images from URL.

Insert image from URL

Sign In

Testing suggests Google’s AI Overviews tell millions of lies per hour

Is 90 percent accuracy good enough for a search robot?

Benchmark problems

User Feedback

Recommended Comments

Mutton 445

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

nsane.down

Latest News

Browse

Activity