Openai finally launched the GPT-5. Here’s everything you need to know

OpenAI’s blog post claims that GPT-5 beats its previous models on several coding benchmarks, including SWE-Bench Verified (scoring 74.9 percent), SWE-Lancer (GPT-5-thinking scored 55 percent), and Aider Polyglot (scored 88 percent), which test the model’s ability to fix bugs, complete freedom-style coding tasks, and work across multiple programming languages.
At a press conference on Wednesday, Yann Dubois, the head of OpenAI after training, prompted the GPT-5 to “create a beautiful, highly interactive web application for my partner, English speaking, to learn French.” He appointed AI to include daily progress, features such as flashcards and quizzes, noting that he hopes the app is wrapped in a “highly engaging theme.” About a minute later, the AI-generated application pops up. While this is just a demonstration on track, the result is a sleek attraction that happens to deliver on Dubois’ request.
“It’s a great coding collaborator and it’s great in proxy tasks,” said Michelle Pokrass, the post-training director. “It effectively executes long chains and tool calls [which means it better understands when and how to use functions like web browsers or external APIs]follow detailed instructions and provide a pre-explanation of their behavior. ”
Openai also said in its blog post that the GPT-5 is “our best model for health-related issues to date.” In three LLM benchmarks related to OpenAI health (Healthbench, HealthBench Hard and HealthBench Sissus), the System Card (document describing the product’s technical capabilities and other research findings) points out that GPT-5 believes that GPT-5 thinking performs better than previous models. The thinking version of GPT-5 scored 25.5% on HealthBench, which is higher than the 31.6% score of O3. These scores are verified by two or more doctors according to the system card.
According to Pokrass, the model is also said to have reduced hallucinations, a common problem for AI to provide false information. Alex Beutel, head of security research at Openai, added that they “significantly reduce the fraud rate of GPT-5”.
“We have taken steps to reduce the tendency to cheat, cheat or hack problems in GPT-5 thinking, although our mitigation is not perfect and requires more research,” the system card said. “In particular, we have trained the model to fail gracefully when tasks that cannot be solved.”
After testing the GPT-5 model without accessing the web browsing model, the company’s system card said that after testing the model that the GPT-5 model did not require access to the web browsing, the researchers found that its hallucination rate (they defined as “the percentage of factual claims containing minor or significant errors or significant errors”) was 26% less common than the GPT-4O model. Compared with O3, the hallucination rate of GPT-5 thinking is reduced by 65%.
For tips that might be dual use (possibly harmful or benign), Beutel says the GPT-5 is “safely done”, which prompts the model to “as helpful as possible for the answer, but with the limitations of staying safe.” According to Beutel, Openai conducted over 5,000 hours of red teams and tested with external organizations to ensure the system is robust.
Openai said it now has nearly 700 million active users of Chatgpt, 5 million paid business users and 4 million developers leverage the API.
“The resonance of this model is really good, and I think people will really feel that,” said Nick Turley, head of Chatgpt. “Especially the average person, they don’t take the time to think about the model.”