OpenAI’s GPT-5 is unlikely to offer most upgrades to AI agents

As 2025 dawns, Openai CEO Sam Altman has promoted two developments that he insists will change our lives. Of course, one of them is GPT-5 – a major upgrade to the large language model (LLM), which promoted the rise of Tech World Superstardom for Chatgpt.
another? An AI agent can not only answer your queries, but also do the work for you. “We believe that in 2025, we may see the first AI agent joining the workforce and essentially changing the company’s output,” Altman wrote in January.
Well, we’ve been eight months old and Altman’s predictions already require a big old asterisk. Of course, companies are eager to adopt AI agents, such as OpenAI’s Chatgpt agent. In a May 2025 report, consulting company giant PwC found that by the end of this year, half of all companies surveyed planned to implement some kind of AI agent by the end of the year. About 88% of executives want to increase their team’s AI budget due to AGIC AI.
GPT-5 is coming soon. This is what the hype won’t tell you.
But what about the actual AI proxy experience? Apologize to all those promising executives, and the comments are almost uniform.
If “AI Agent” is a new high-tech James Bond movie, then this is the mushy you see on Rotten Tomatoes: “Glitchy … Incosistents” (wired); “Like an ignorant internet newbie” (fast company); “Reality doesn’t fit the hype” (fortune); “Not matched with buzzwords” (Bloomberg), “new evaporator software…over-promotion is worse than ever” (Forbes).
Research finds Openai’s entry fails almost every time
A Carnegie Mellon University study (PDF) in May 2025 found that Google’s Gemini Pro 2.5 failed 70% of the time. That’s The best– Performance Agent. Entrance fees for OpenAI powered by GPT 4.O are over 90%.
GPT-5 may improve this number…but that’s not much. Not only because earlier reports said Openai worked hard to fill enough improvements to GPT-5 to make it worthy of issuance numbers.
Indeed, it started to see disappointed researchers like this, all fitting into the whole process of LLMS learning to do things for you. As this AI agent engineer’s analysis clearly shows that the problem is simple math: over time, the more tasks the agent does, the worse they get. Like all artificial intelligence, AI agents performing multiple complex tasks are prone to hallucinations.
Mixable light speed
Finally, some agents “panic” and may cause “catastrophic errors” in their judgments to refer to the apology of the Replate AI agent, which literally deleted the client’s database nine days after performing the encoding task. (The CEO of the reply called the failure “unacceptable”.)
Arguably, this isn’t the only AI-Agent-Wipes-Code story of 2025, which explains why an aggressive startup offers insurance in your AI agents and why Wal-Mart had to introduce four “super agencies” to promote its AI agents.
No wonder the recent Gartner paper predicts that 40% of all AI agents currently initiated by the company will be cancelled within 2 years. “Most proxy AI projects are driven by hype and misuse … This can blindly organize the actual cost and complexity of deploying AI agents at scale,” wrote senior analyst Anushree Verma.
What can GPT-5 do with AI agents?
Once powered by GPT-5, the ChatGpt proxy may top the top of the reliability chart. (Again, that’s not the highest barrier.) But the new version is unlikely to solve the approach that really puts the proxy world in trouble.
This is because companies and regulators have set up guardrails and even shut down what the most reliable AI agents can do for you.
Take Amazon as an example. Like most tech giants, the world’s largest retailers are playing a big game for AI agents (as I did at the Shanghai Agents AI Expo above in July, as shown in the picture above). Meanwhile, Amazon shut down the ability of any AI agent to browse and buy anywhere on its website.
This makes sense for Amazon, which has always wanted to control the customer experience, not to mention its desire to provide advertising and sponsored results to the actual human eyeball. But that also cuts down on a lot of potential agent activity there. (On the plus side, there is no “disastrous failure” involving a large amount of delivery at your doorstep.)
Do we trust AI agents to buy for us online? Not that they are evil and want to steal your credit card data; it’s that they are childish and vulnerable to bad actors Do Want your card.
Even GPT-5 may not encounter a vulnerability that researchers have seen: data embedded in images can also instruct AI agents to display any credit card information they may have, while users are not very wise.
If this problem is exploited at the company size, then Altman might be right about the “substantial change output” of AI agents, but he doesn’t mean it.
theme
Artificial Intelligence Openai