GDPval-AA

Progress Over Time

Interactive timeline showing model performance evolution on GDPval-AA

State-of-the-art frontier
Open
Proprietary

GDPval-AA Leaderboard

33 models
ContextCostLicense
1
21.0M$5.00 / $25.00
31.0M$5.00 / $25.00
41.0M$5.00 / $25.00
5
MiniMax
MiniMax
1.0M$0.30 / $1.20
6
OpenAI
OpenAI
1.0M$2.50 / $15.00
71.0T
8200K$3.00 / $15.00
9
101.0M$1.50 / $9.00
111.6T1.0M$1.60 / $3.20
12
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$1.25 / $3.75
131.0T1.0M$0.43 / $0.87
14
Zhipu AI
Zhipu AI
754B200K$1.40 / $4.40
15284B1.0M$0.10 / $0.20
16
Moonshot AI
Moonshot AI
1.0T262K$0.75 / $3.50
17400K$0.75 / $4.50
18550B
19205K$0.30 / $1.20
20
21
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$0.50 / $3.00
22
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
28B262K$0.60 / $3.60
23
OpenAI
OpenAI
1.1M$5.00 / $30.00
24400K$0.20 / $1.25
251.0M$1.25 / $2.50
26
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B
27
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B
281.0M$2.50 / $15.00
29
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
397B
30
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$0.32 / $1.28
31128B256K$1.50 / $7.50
32200K$1.00 / $5.00
3331B262K$0.13 / $0.38
Notice missing or incorrect data?

Sub-benchmarks

About this benchmark

What is GDPval-AA?

GDPval-AA is an evaluation of AI model performance on economically valuable knowledge work tasks across professional domains including finance, legal, and other sectors. Run independently by Artificial Analysis, it uses Elo scoring to rank models on real-world work task performance.

GDPval-AA is a text benchmark evaluating models on legal, reasoning, finance, general, and agents tasks. LLM Stats tracks 33 models on this benchmark, scored on a 0–3000 scale. The current average is 1230.7, with the leader at 1815.0.

Compare leaders on the best AI for legal, best AI for reasoning, best AI for finance, best AI for general and best AI for agents leaderboards.

Current leaders

Claude Fable 5 from Anthropic currently leads the GDPval-AA leaderboard with a score of 1815.000 across 33 evaluated AI models.

1Claude Fable 5Anthropic1815.000
2Claude Opus 4.8Anthropic1638.000
3Claude Opus 4.6Anthropic1606.000
OSSMiniMax M3#5 open-weight1431.000

FAQ

Common questions about the GDPval-AA benchmark and leaderboard.

What is the GDPval-AA benchmark?

GDPval-AA is an evaluation of AI model performance on economically valuable knowledge work tasks across professional domains including finance, legal, and other sectors. Run independently by Artificial Analysis, it uses Elo scoring to rank models on real-world work task performance.

What is the GDPval-AA leaderboard?

The GDPval-AA leaderboard ranks 33 AI models based on their performance on this benchmark. Currently, Claude Fable 5 by Anthropic leads with a score of 1815.000. The average score across all models is 1230.727.

What is the highest GDPval-AA score?

The highest GDPval-AA score is 1815.000, achieved by Claude Fable 5 from Anthropic.

How many models are evaluated on GDPval-AA?

33 models have been evaluated on the GDPval-AA benchmark, with 0 verified results and 3 self-reported results.

What categories does GDPval-AA cover?

GDPval-AA is categorized under legal, reasoning, finance, general, and agents. The benchmark evaluates text models.

Are there variants of GDPval-AA?

Yes. GDPval-AA has 1 related variant: GDPval-MM.

What is the best open-source model on GDPval-AA?

MiniMax M3 by MiniMax is the top-ranked open-source model on GDPval-AA, with a score of 1431.000 (rank #5).

Which model offers the best value on GDPval-AA?

Among models scoring within 10% of the leader, Claude Opus 4.8 from Anthropic is the cheapest, at $5.00 per million input tokens with a score of 1638.000.

How recent are the GDPval-AA leaderboard results?

The GDPval-AA leaderboard was last updated in June 2026 and currently includes 33 evaluated models.