Openai says that GPT-5 has fewer hallucinations – how do you say the data?

americanftwr August 7, 2025

0 0 3 minutes read

Openai says that GPT-5 has fewer hallucinations – how do you say the data?

OpenAI has officially launched the GPT-5, promising a faster and more capable AI model to power Chatgpt.

AI companies have the most advanced performance in mathematics, coding, writing and health advice. Openai proudly stated that the GPT-5’s hallucination rate has decreased compared to the earlier models.

Specifically, the time for GPT is 9.6%, while for GPT-4O is 12.9%. According to the GPT-5 system card, the new model has a hallucination rate of 26% lower than that of the GPT-4O. Furthermore, GPT-5’s response was reduced by 44%, “at least one major factual error.”

While this is a certain improvement, it also means that one-tenth of the GPT-5’s answer may contain hallucinations. This is worrying, especially since Openai touts promising use cases for healthcare as a new model.

See:

How to try OpenAI’s GPT-5 for yourself today

How GPT-5 reduces hallucinations

For AI researchers, hallucinations are an annoying problem. Large Language Models (LLMS) are trained to generate the next possible word and guided by its well-trained large amount of data. This means that LLM can sometimes confidently produce an inaccurate or pure sentence. One might assume that hallucination speed will decrease as the model improves through factors such as better data, training and computing power. But Openai’s inference models O3 and O4-Mini show an unsettling trend that even its researchers can’t fully explain: they have more hallucinations than previous models, O1, GPT-4O and GPT-4.5. Some researchers believe that hallucinations are inherent in LLM, not errors that can be solved.

Mixable light speed

That is, according to its system card, the GPT-5 has fewer hallucinations than previous models. OpenAI evaluates GPT-5 and GPT-5 versions with additional inference ability called GPT-5 Inkining for its inference model O3 and the more traditional Model GPT-4O. An important part of evaluating hallucination rates is to get the model to access the network. Generally speaking, the model is more accurate when it is able to get answers from online accurate data rather than relying solely on its training data (more on the training below). Here is the hallucination rate when given to the model’s web browsing access:

In the system card, OpenAI also evaluates various versions of GPT-5 with more open and complex tips. Here, compared with previous inference O3 and O4-Mini, there are much fewer GPT-5 with inference power illusion. It is believed that inference models are more accurate and have fewer hallucinations because they apply more computing power to solve problems, which is why the speed of hallucination of O3 and O4-Mini is somewhat confusing.

Overall, GPT-5 performs very well when connected to the network. But the results of another evaluation tell a different story. OpenAI tested GPT-5 in its internal benchmarks, a simple quality check. According to the description of the system card, the test is a “factual question with short answers that measure the accuracy of the answers tried”. For this evaluation, GPT-5 has no web access, it shows. In this test, the hallucination rate was higher.

GPT-5 has a slightly better mindset than O3, while normal GPT-5 hallucinations are one percent higher The ratio O3 and several percentage points below GPT-4O. To be fair, the hallucination rate for simple QA assessments in all models is high. But it’s not a good comfort. Users without internet searches will experience higher risk of hallucinations and errors. So if you use Chatgpt for something that really matters, make sure it is searching the web. Or, you can search the web yourself.

Users will soon find GPT-5 hallucinations

But, despite the reported low inaccuracy overall, one of the demonstrations showed an embarrassing error. Beth Barnes, founder and CEO of AI Research’s nonprofit METR, discovered inaccuracies in the GPT-5 demonstration, explaining how the aircraft works. Barnes said the GPT-5 cites common misconceptions related to the Bernoulli effect, which explains how air flows around the wings of the aircraft. The explanation of GPT-5 is wrong.