EQ-Bench

PaperImplementation

Progress Over Time

Interactive timeline showing model performance evolution on EQ-Bench

State-of-the-art frontier
Open
Proprietary

EQ-Bench Leaderboard

2 models
ContextCostLicense
1
2
Notice missing or incorrect data?
About this benchmark

What is EQ-Bench?

EQ-Bench is an LLM-judged test evaluating active emotional intelligence abilities, understanding, insight, empathy, and interpersonal skills. The test set contains 45 challenging roleplay scenarios, most of which constitute pre-written prompts spanning 3 turns. The benchmark evaluates the performance of models by validating responses against several criteria and conducts pairwise comparisons to report a normalized Elo computation for each model.

EQ-Bench is a text benchmark evaluating models on reasoning, roleplay, general, creativity, and writing tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–2000 scale. The current average is 1585.5, with the leader at 1586.0.

Compare leaders on the best AI for reasoning, best AI for roleplay, best AI for general, best AI for creativity and best AI for writing leaderboards.

Current leaders

Grok-4.1 Thinking from xAI currently leads the EQ-Bench leaderboard with a score of 1586.000 across 2 evaluated AI models.

1Grok-4.1 ThinkingxAI1586.000
2Grok-4.1xAI1585.000

FAQ

Common questions about the EQ-Bench benchmark and leaderboard.

What is the EQ-Bench benchmark?

EQ-Bench is an LLM-judged test evaluating active emotional intelligence abilities, understanding, insight, empathy, and interpersonal skills. The test set contains 45 challenging roleplay scenarios, most of which constitute pre-written prompts spanning 3 turns. The benchmark evaluates the performance of models by validating responses against several criteria and conducts pairwise comparisons to report a normalized Elo computation for each model.

What is the EQ-Bench leaderboard?

The EQ-Bench leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Grok-4.1 Thinking by xAI leads with a score of 1586.000. The average score across all models is 1585.500.

What is the highest EQ-Bench score?

The highest EQ-Bench score is 1586.000, achieved by Grok-4.1 Thinking from xAI.

How many models are evaluated on EQ-Bench?

2 models have been evaluated on the EQ-Bench benchmark, with 0 verified results and 2 self-reported results.

Where can I find the EQ-Bench paper?

The EQ-Bench paper is available at https://github.com/EQ-Bench/EQ-Bench. The paper details the methodology, dataset construction, and evaluation criteria.

Where can I find the EQ-Bench dataset?

The EQ-Bench dataset is available at https://github.com/EQ-Bench/EQ-Bench.

What categories does EQ-Bench cover?

EQ-Bench is categorized under reasoning, roleplay, general, creativity, and writing. The benchmark evaluates text models.

How recent are the EQ-Bench leaderboard results?

The EQ-Bench leaderboard was last updated in July 2026 and currently includes 2 evaluated models.