GDPval-AA

Name: GDPval-AA Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on GDPval-AA

State-of-the-art frontier

Open

Proprietary

GDPval-AA Leaderboard

33 models

			Context	Cost
1	Claude Fable 5 Anthropic	—	—	—
2	Claude Opus 4.8 Anthropic	—	1.0M	$5.00 / $25.00
3	Claude Opus 4.6 Anthropic	—	1.0M	$5.00 / $25.00
4	Claude Opus 4.7 Anthropic	—	1.0M	$5.00 / $25.00
5	MiniMax M3 MiniMax	—	1.0M	$0.30 / $1.20
6	GPT-5.4 OpenAI	—	1.0M	$2.50 / $15.00
7	MiMo-V2-Pro Xiaomi	1.0T	—	—
8	Claude Sonnet 4.6 Anthropic	—	200K	$3.00 / $15.00
9	MiMo-V2-Omni Xiaomi	—	—	—
10	Gemini 3.5 Flash Google	—	1.0M	$1.50 / $9.00
11	DeepSeek-V4-Pro-Max DeepSeek	1.6T	1.0M	$1.60 / $3.20
12	Qwen3.7 Max Alibaba Cloud / Qwen Team	—	1.0M	$1.25 / $3.75
13	MiMo-V2.5-Pro Xiaomi	1.0T	1.0M	$0.43 / $0.87
14	GLM-5.1 Zhipu AI	754B	200K	$1.40 / $4.40
15	DeepSeek-V4-Flash-Max DeepSeek	284B	1.0M	$0.10 / $0.20
16	Kimi K2.6 Moonshot AI	1.0T	262K	$0.75 / $3.50
17	GPT-5.4 mini OpenAI	—	400K	$0.75 / $4.50
18	Nemotron 3 Ultra (550B A55B) NVIDIA	550B	—	—
19	MiniMax M2.7 MiniMax	—	205K	$0.30 / $1.20
20	Muse Spark Meta	—	—	—
21	Qwen3.6 Plus Alibaba Cloud / Qwen Team	—	1.0M	$0.50 / $3.00
22	Qwen3.6-27B Alibaba Cloud / Qwen Team	28B	262K	$0.60 / $3.60
23	GPT-5.5 OpenAI	—	1.1M	$5.00 / $30.00
24	GPT-5.4 nano OpenAI	—	400K	$0.20 / $1.25
25	Grok 4.3 xAI	—	1.0M	$1.25 / $2.50
26	Qwen3.6-35B-A3B Alibaba Cloud / Qwen Team	35B	—	—
27	Qwen3.5-122B-A10B Alibaba Cloud / Qwen Team	122B	—	—
28	Gemini 3.1 Pro Google	—	1.0M	$2.50 / $15.00
29	Qwen3.5-397B-A17B Alibaba Cloud / Qwen Team	397B	—	—
30	Qwen3.7-Plus Alibaba Cloud / Qwen Team	—	1.0M	$0.32 / $1.28
31	Mistral Medium 3.5 Mistral AI	128B	256K	$1.50 / $7.50
32	Claude Haiku 4.5 Anthropic	—	200K	$1.00 / $5.00
33	Gemma 4 31B Google	31B	262K	$0.13 / $0.38

Notice missing or incorrect data?

Sub-benchmarks

GDPval-MM

GDPval-MM is the multimodal variant of the GDPval benchmark, evaluating AI model performance on real-world economically valuable tasks that require processing and generating multimodal content including documents, slides, diagrams, spreadsheets, images, and other professional deliverables across diverse industries.

multimodal•Max 1

About this benchmark

What is GDPval-AA?

GDPval-AA is an evaluation of AI model performance on economically valuable knowledge work tasks across professional domains including finance, legal, and other sectors. Run independently by Artificial Analysis, it uses Elo scoring to rank models on real-world work task performance.

GDPval-AA is a text benchmark evaluating models on legal, reasoning, finance, general, and agents tasks. LLM Stats tracks 33 models on this benchmark, scored on a 0–3000 scale. The current average is 1230.7, with the leader at 1815.0.

Compare leaders on the best AI for legal, best AI for reasoning, best AI for finance, best AI for general and best AI for agents leaderboards.

Current leaders

Claude Fable 5 from Anthropic currently leads the GDPval-AA leaderboard with a score of 1815.000 across 33 evaluated AI models.

Claude Fable 5Anthropic1815.000

Claude Opus 4.8Anthropic1638.000

Claude Opus 4.6Anthropic1606.000

OSS

MiniMax M3#5 open-weight1431.000

FAQ

Common questions about the GDPval-AA benchmark and leaderboard.

What is the GDPval-AA benchmark?

What is the GDPval-AA leaderboard?

The GDPval-AA leaderboard ranks 33 AI models based on their performance on this benchmark. Currently, Claude Fable 5 by Anthropic leads with a score of 1815.000. The average score across all models is 1230.727.

What is the highest GDPval-AA score?

The highest GDPval-AA score is 1815.000, achieved by Claude Fable 5 from Anthropic.

How many models are evaluated on GDPval-AA?

33 models have been evaluated on the GDPval-AA benchmark, with 0 verified results and 3 self-reported results.

What categories does GDPval-AA cover?

GDPval-AA is categorized under legal, reasoning, finance, general, and agents. The benchmark evaluates text models.

Are there variants of GDPval-AA?

Yes. GDPval-AA has 1 related variant: GDPval-MM.

What is the best open-source model on GDPval-AA?

MiniMax M3 by MiniMax is the top-ranked open-source model on GDPval-AA, with a score of 1431.000 (rank #5).

Which model offers the best value on GDPval-AA?

Among models scoring within 10% of the leader, Claude Opus 4.8 from Anthropic is the cheapest, at $5.00 per million input tokens with a score of 1638.000.

How recent are the GDPval-AA leaderboard results?

The GDPval-AA leaderboard was last updated in June 2026 and currently includes 33 evaluated models.