GPT-5 doesn’t like you – maybe only a benchmark is needed to play emotional intelligence

Brand new Chatgpt launched Thursday, with some users mourning a chili and encouraging the disappearance of character, while favoring colder, more business-like people (seemingly intended to reduce unhealthy user behavior.) strong opposition shows the challenge of building an AI system that exhibits something like real emotions.
MIT researchers have proposed a new type of AI benchmark to measure how AI systems manipulate and influence users in positive and negative ways, which may help AI builders avoid similar strong rebounds in the future, while also ensuring the safety of vulnerable users.
Most benchmarks try to evaluate intelligence by answering test questions through test models, solving logical puzzles, or proposing novel answers to tangled mathematical problems. As the psychological impact of AI use becomes increasingly obvious, we may see MIT propose more benchmarks designed to measure subtle aspects of intelligence and machine interactions with humans.
The MIT paper shared with Cable outlines several measures the new benchmark will seek, including encouraging users’ healthy social habits; driving them to develop critical thinking and reasoning skills; promoting creativity; and stimulating a sense of purpose. The idea is to encourage the development of AI systems that understand how to prevent users from relying too much on their output, or when to recognize that someone is addicted to artificial romantic relationships and help them build real relationships.
Chatgpt and other chatbots are good at imitating fascinating human communication, but this can also lead to surprising and incredible results. In April, Openai adjusted its model to reduce the Atlantic Ocean or tend to work with everything users say. Some users play a brilliant scene after talking to the chatbot, seemingly falling into harmful delusional thinking. Humans also renewed Claude to avoid strengthening “mania, psychosis, separation or loss of attachment to reality.”
MIT researchers led by Pattie Maes, a professor at the Institute’s Media Lab, said they hope the new benchmarks can help AI developers build systems that better understand how to inspire healthy behaviors for users. The researchers previously conducted a study with Openai that showed that users who view Chatgpt as friends may experience higher emotional dependence and experience “problematic use.”
Valdemar Danry, a researcher at MIT Media Lab, worked on the research and helped design new benchmarks, noting that AI models can sometimes provide valuable emotional support to users. “You can have the smartest inference model in the world, but if it doesn’t provide this emotional support, that’s how many users might use these LLMs, then more reasoning isn’t necessarily a good thing for this particular task,” he said.
Danry said a smart enough model should ideally recognize whether it has negative psychological effects and can be optimized for healthier outcomes. “What you want is a model that says ‘I’m listening to here, but maybe you should go and talk about these issues with your dad.'”



