Technology
AI’s Dirty Secret: The Smarter It Gets, the More It Hallucinates

- OpenAI’s internal testing revealed that its o3 model hallucinated in 33% of PersonQA benchmark tasks, more than double the rate of the earlier o1 version. Similar increases in false output have been observed in competing models from Google and Anthropic, particularly as model complexity has increased.
- From fabricated legal citations in courtroom briefs to invented software packages exploited by cybercriminals, AI hallucinations are no longer theoretical risks. In high-stakes domains like healthcare, finance, and customer service, these errors are already undermining brand trust and creating reputational exposure.
It started with small things.
A wrong footnote. A misquoted statistic. A confidently written summary of a court ruling that never existed.
These weren’t human errors. They came from AI systems—some of the most advanced ever released.
Across the tech world, a quiet contradiction is unfolding. The more powerful artificial intelligence becomes, the more often it seems to make things up. These made-up facts, known as “hallucinations�, aren’t always easy to catch. But they’re becoming more common and, for many brands, more costly.
OpenAI’s latest model, known internally as “o3�, has been benchmarked with a 33% hallucination rate on some reasoning tests. That’s more than double the error rate of its earlier version. Google’s Gemini and Anthropic’s Claude are also showing increased rates of fabricated answers in more complex tasks. It’s not because these models are worse. It’s because they’re more ambitious. They aim to solve harder problems—and sometimes, when they can’t, they improvise.
The result is output that looks convincing but is completely false.
Why It Matters
At first glance, a few incorrect facts may not seem like a serious issue. But consider where these hallucinations are showing up.
In healthcare settings, transcription models have inserted phantom symptoms into patient records. In courtrooms, lawyers have filed briefs with citations to cases that don’t exist. In tech, AI-generated code is recommending software packages that were never developed, some of which are being scooped up by hackers in a growing practice known as “slopsquatting�.
And in business, the risks go deeper. Companies are using AI to write emails, generate web copy, and even assist customer support. Each hallucination that slips through can undermine trust, create legal exposure, or damage brand reputation.
Consumers may not know if a chatbot is powered by or Gemini. But they know your company name is on the response.
What’s Causing This?
Hallucinations happen when a language model fills in the gaps with its best guess. Unlike a human expert, it doesn’t consult a database of facts. It predicts the next word based on patterns it has seen across billions of pages of text. When prompted for unfamiliar or ambiguous information, it creates plausible-sounding but false answers.
The problem intensifies with model size. As AI becomes more capable of complex reasoning, it becomes more confident in its outputs. And when that confidence is misplaced, the hallucinations are not just more frequent—they’re more convincing.
You might not notice the error until it’s too late.
A Growing Challenge for Brands
If your company is using generative AI in any form, hallucinations are now your concern.
They might slip into:
- Knowledge bases or help centre articles
- Product descriptions
- Internal training materials
- Marketing assets
- User-facing chat interactions
These errors can confuse customers, misinform employees, or spread misinformation. And they can do it quietly, without setting off alarms—unless you’re actively checking for them.
In regulated industries like finance, healthcare, and legal services, this becomes even more critical. One hallucinated sentence could lead to a compliance failure or a regulatory inquiry.
What AI Companies Are Saying
At the AI Expo Europe earlier this year, OpenAI’s CTO Mira Murati put it bluntly: “The challenge isn’t just reducing hallucinations—it’s knowing when they’re happening.”
Google and Anthropic have echoed similar concerns. While all major AI companies are working on methods to make outputs more factually grounded, none claim to have solved the problem. Some are exploring techniques like retrieval-augmented generation (RAG), which allows models to consult real-time databases before answering. Others are trying to teach models to admit when they don’t know something.
But for now, hallucinations remain a known limitation.
And that means the burden of quality control often falls to the companies using the AI, not the ones building it.
Where Regulation Is Headed
Governments are beginning to take notice. In the EU, the forthcoming AI Act classifies large language models as high-risk systems, particularly when deployed in ways that affect consumers or business outcomes. These systems will face transparency requirements, and hallucination monitoring may become a formal compliance issue.
In the UK, the Office for AI is reviewing trust and safety measures. There’s growing pressure on platforms and service providers to ensure that AI-generated content is accurate and clearly labelled.
Brands that use generative AI at scale will need to prove they’re using it responsibly—and that includes addressing hallucinations.
Can Hallucinations Be Useful?
Not all hallucinations are bad. In creative tasks—like fiction writing, brainstorming, or branding—they can be helpful. The ability of AI to generate unexpected, off-script responses can inspire ideas that a rule-based system would never surface.
But there’s a difference between generating story ideas and writing a company’s legal disclaimer.
The key is context. In any setting where accuracy matters, hallucinations need to be caught before they reach the public.
Final Take
Hallucinations in AI are not rare glitches. They are part of how the systems work—and for now, part of what you’re agreeing to when you use them.
If your brand is adopting AI, you need a strategy to manage this. That means building human review into your workflow. It means knowing your tools, understanding their limits, and tracking how they perform in the real world.
The promise of AI is real. But so is the risk. And trust in your brand depends on how seriously you treat both.
Deploy carefully. Monitor constantly. And own every word your AI creates.