Dark Mode

OpenAI’s hallucination fix could make ChatGPT less useful for users

Written by Michael Anthony Bitoon
Michael Anthony Bitoon

Michael Anthony Bitoon is a news writer and software developer who loves technology, data, and video games. A recent graduate of the University of the Philippines Visayas, where he earned his Compu...

All Articles by Michael Anthony Bitoon

Published 15 Sep 2025

Fact checked by Sophia Feona Cantiller
Sophia Feona Cantiller

Sophia Feona Cantiller, a cum laude graduate in Computer Science from the University of the Philippines, swapped coding bugs for content buzz.

Her true love? Writing stories.

Aside f...

All Articles by Sophia Feona Cantiller
NSFW AI Why trust Greenbot

We maintain a strict editorial policy dedicated to factual accuracy, relevance, and impartiality. Our content is written and edited by top industry professionals with first-hand experience. The content undergoes thorough review by experienced editors to guarantee and adherence to the highest standards of reporting and publishing.

disclosure

openai chatgpt hallucination

OpenAI researchers have found a way to stop ChatGPT from making things up, but their solution could force the artificial intelligence (AI) to admit ignorance on nearly one-third of user questions.

The September 2025 research paper explains why large language models confidently generate false information. Current testing methods reward AI systems for guessing wrong answers rather than saying “I don’t know.”

    “Language models are optimized to be good test-takers, and guessing when uncertain improves test performance,” the OpenAI researchers wrote.

    Like in a multiple-choice test, students who guess might get lucky. Those who leave blanks get zero points. AI models face the same pressure.

    When researchers asked various AI models for co-author Adam Kalai’s birthday, one system gave three different wrong dates across separate attempts. None came close to his actual autumn birthday.

    The mathematical fix involves changing how AI systems get graded. Instead of only measuring accuracy, evaluations should heavily penalize confident mistakes while rewarding expressions of uncertainty.

    But this solution creates a serious problem. OpenAI’s analysis suggests models would need to abstain from answering up to 30% of queries to avoid hallucinations.

    Users expect instant, authoritative responses from ChatGPT. An AI that frequently admits ignorance might drive people toward competitors that prioritize confidence over accuracy.

    The trade-off mirrors real-world scenarios. Wei Xing, who studies AI at the University of Sheffield, compared it to air-quality monitoring in Salt Lake City. When systems flag measurement uncertainties, user engagement drops noticeably compared to displays showing confident readings.

    GPT-5 already shows reduced hallucination rates, especially when allowed to browse the web for information. On one benchmark testing citation accuracy, GPT-5 made errors 39% of the time without internet access, but only 0.8% with web browsing enabled.

    “For most cases of hallucination, the rate has dropped to a level” that seems “acceptable to users,” said Tianyang Xu, an AI researcher at Purdue University. However, technical fields like law and mathematics still trip up GPT-5.

    The economic factors complicate matters further. Uncertainty-aware models require significantly more computational power. They must evaluate multiple possible responses and estimate confidence levels for each query.

    Such costs make sense for high-stakes applications, such as medical diagnosis or financial trading, where mistakes can cost millions. For everyday consumer use, the economics become prohibitive.

    The research team argues that widespread adoption requires changing industry evaluation standards.

    Major AI benchmarks from Google, OpenAI, and leading leaderboards use a binary grading system. Nine out of ten benchmarks examined by the researchers award zero points when models express doubt, creating an “epidemic” of penalizing honest responses.

    OpenAI’s proposed reforms could reshape how the entire AI industry develops and tests language models. The question remains whether users will accept more honest but less confident AI assistants.

    NSFW AI

    Why trust Greenbot

    We maintain a strict editorial policy dedicated to factual accuracy, relevance, and impartiality. Our content is written and edited by top industry professionals with first-hand experience. The content undergoes thorough review by experienced editors to guarantee and adherence to the highest standards of reporting and publishing.

    disclosure

    Related Articles

    apple iphone 17 ai trail

    Apple’s iPhone 17 launch highlights how far it trails in AI

    Michael Anthony Bitoon
    claude joins microsoft office 365

    Claude joins Office 365 as Microsoft looks beyond ChatGPT

    Michael Anthony Bitoon
    google veo 3 vertical videos

    Google slashes Veo 3 pricing and adds TikTok-style vertical videos

    Michael Anthony Bitoon
    amazon music ai curated playlists

    Amazon Music adds AI-curated ‘Weekly Vibe’ playlists to fight listener fatigue

    Michael Anthony Bitoon

    Featured Stories

    Latest Posts

    Reviews

    Follow Android Beat