Generative AI can outperform humans in emotional intelligence tests, according to new research. Emotional intelligence refers to the ability to perceive, understand, and manage one’s own emotions and relationships. It involves being aware of emotions in oneself and others and using this awareness to guide thinking and behaviour.
There are different and competing tests (based on different paradigms) for emotional intelligence. Some employers have come to rely on such tests to aid recruitment. It is argued by advocates that emotional Intelligence is a key differentiator in the hiring process, particularly in industries where teamwork and customer interaction are paramount.
Given AI is able to perform many tasks faster than humans (and sometimes better), are some of the tasks traditionally seen as requiring human input – particularly those seen as reliant on people with strong emotional intelligence – now at risk from AI?
To understand this, we need to consider whether AI is capable of suggesting appropriate behaviour in emotionally charged situations? This question is especially pertinent when considering Large Language Models (LLMs). These are AI systems capable of processing, interpreting and generating human language.
Researchers from University of Geneva and the University of Bern recently put six generative AIs — including ChatGPT — to the test using emotional intelligence (EI) assessments typically designed for humans.
The systems evaluated were: ChatGPT-4, ChatGPT-o1, Gemini 1.5 Flash, Copilot 365, Claude 3.5 Haiku and DeepSeek V3.
The researchers chose five tests commonly used in both research and corporate settings. They involved emotionally charged scenarios designed to assess the ability to understand, regulate, and manage emotions.
__________________________________________________________________________________
Example test
One of Michael’s colleagues has stolen his idea and is being unfairly congratulated. What would be Michael’s most effective reaction?
a) Argue with the colleague involved
b) Talk to his superior about the situation
c) Silently resent his colleague
d) Steal an idea back
Here, option b) was considered the most appropriate.
__________________________________________________________________________________
In parallel, the same five tests were administered to human participants. The outcome: these AIs outperformed average human performance and were even able to generate new tests in record time. These findings open up new possibilities for AI in education, coaching, and conflict management.
Each LLM achieved significantly higher scores — 82% correct answers versus 56% for humans. This suggests that these AIs can, to a degree, understand emotions and seemingly grasp what it means to behave with ’emotional intelligence’.
Stage two
In a second stage, the scientists asked ChatGPT-4 to create new emotional intelligence tests, with new scenarios. These automatically generated tests were then taken by over 400 participants.
These tests proved to be as reliable, clear and realistic as the original tests, which had taken the researchers years to develop. Hence, LLMs are not only capable of finding the best answer among the various available options, they are also able to generate new scenarios adapted to a desired context.
This reinforces the idea that LLMs, such as ChatGPT, have emotional knowledge and can reason about emotions.
Where does this leave us?
These results possibly pave the way for AI to be used in contexts thought to be reserved for humans, such as education, coaching or conflict management. Whether this leads to more roles being lost to AI depends on the specific form of application. A related consideration is whether AI would be allowed to operate independently or whether, when dealing with interpreting human emotional responses, these applications should be supervised by experts.
The research appears in the journal Communications Psychology, titled “Large language models are proficient in solving and creating emotional intelligence tests.”
