Creative Writing v3

Paper

Progress Over Time

Interactive timeline showing model performance evolution on Creative Writing v3

State-of-the-art frontier
Open
Proprietary

Creative Writing v3 Leaderboard

13 models
ContextCostLicense
1
2
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
235B
4
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
235B
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B
7
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
8
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
80B
9
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
11
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B
12
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.18 / $2.09
13
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $1.00
Notice missing or incorrect data?
About this benchmark

What is Creative Writing v3?

EQ-Bench Creative Writing v3 is an LLM-judged creative writing benchmark that evaluates models across 32 writing prompts with 3 iterations per prompt. Uses a hybrid scoring system combining rubric assessment and Elo ratings through pairwise comparisons. Challenges models in areas like humor, romance, spatial awareness, and unique perspectives to assess emotional intelligence and creative writing capabilities.

Creative Writing v3 is a text benchmark evaluating models on creativity and writing tasks. LLM Stats tracks 13 models on this benchmark, scored on a 0–1 scale. The current average is 264.6, with the leader at 1721.9.

Compare leaders on the best AI for creativity and best AI for writing leaderboards.

Current leaders

Grok-4.1 Thinking from xAI currently leads the Creative Writing v3 leaderboard with a score of 1721.900 across 13 evaluated AI models.

1Grok-4.1 ThinkingxAI172190.0%
2Grok-4.1xAI170860.0%
3Qwen3-235B-A22B-Instruct-2507Alibaba Cloud / Qwen Team87.5%

Source paper

Title
EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models
Authors
Samuel J. Paech
Published
Abstract

We introduce EQ-Bench, a novel benchmark designed to evaluate aspects of emotional intelligence in Large Language Models (LLMs). We assess the ability of LLMs to understand complex emotions and social interactions by asking them to predict the intensity of emotional states of characters in a dialogue. The benchmark is able to discriminate effectively between a wide range of models. We find that EQ-Bench correlates strongly with comprehensive multi-domain benchmarks like MMLU (Hendrycks et al., 2020) (r=0.97), indicating that we may be capturing similar aspects of broad intelligence. Our benchmark produces highly repeatable results using a set of 60 English-language questions. We also provide open-source code for an automated benchmarking pipeline at https://github.com/EQ-bench/EQ-Bench and a leaderboard at https://eqbench.com

FAQ

Common questions about the Creative Writing v3 benchmark and leaderboard.

What is the Creative Writing v3 benchmark?

EQ-Bench Creative Writing v3 is an LLM-judged creative writing benchmark that evaluates models across 32 writing prompts with 3 iterations per prompt. Uses a hybrid scoring system combining rubric assessment and Elo ratings through pairwise comparisons. Challenges models in areas like humor, romance, spatial awareness, and unique perspectives to assess emotional intelligence and creative writing capabilities.

What is the Creative Writing v3 leaderboard?

The Creative Writing v3 leaderboard ranks 13 AI models based on their performance on this benchmark. Currently, Grok-4.1 Thinking by xAI leads with a score of 1721.900. The average score across all models is 264.597.

What is the highest Creative Writing v3 score?

The highest Creative Writing v3 score is 1721.900, achieved by Grok-4.1 Thinking from xAI.

How many models are evaluated on Creative Writing v3?

13 models have been evaluated on the Creative Writing v3 benchmark, with 0 verified results and 13 self-reported results.

Where can I find the Creative Writing v3 paper?

The Creative Writing v3 paper is available at https://arxiv.org/abs/2312.06281. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does Creative Writing v3 cover?

Creative Writing v3 is categorized under creativity and writing. The benchmark evaluates text models.

What is the best open-source model on Creative Writing v3?

Qwen3-235B-A22B-Instruct-2507 by Alibaba Cloud / Qwen Team is the top-ranked open-source model on Creative Writing v3, with a score of 0.875 (rank #3).

How recent are the Creative Writing v3 leaderboard results?

The Creative Writing v3 leaderboard was last updated in June 2026 and currently includes 13 evaluated models.