GDPval-AA
Progress Over Time
Interactive timeline showing model performance evolution on GDPval-AA
GDPval-AA Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Anthropic | — | — | — | ||
| 2 | Anthropic | — | 1.0M | $5.00 / $25.00 | ||
| 3 | Anthropic | — | 1.0M | $5.00 / $25.00 | ||
| 4 | Anthropic | — | 1.0M | $5.00 / $25.00 | ||
| 5 | MiniMax | — | 1.0M | $0.30 / $1.20 | ||
| 6 | OpenAI | — | 1.0M | $2.50 / $15.00 | ||
| 7 | Xiaomi | 1.0T | — | — | ||
| 8 | Anthropic | — | 200K | $3.00 / $15.00 | ||
| 9 | Xiaomi | — | — | — | ||
| 10 | Google | — | 1.0M | $1.50 / $9.00 | ||
| 11 | DeepSeek | 1.6T | 1.0M | $1.60 / $3.20 | ||
| 12 | Alibaba Cloud / Qwen Team | — | 1.0M | $1.25 / $3.75 | ||
| 13 | Xiaomi | 1.0T | 1.0M | $0.43 / $0.87 | ||
| 14 | Zhipu AI | 754B | 200K | $1.40 / $4.40 | ||
| 15 | DeepSeek | 284B | 1.0M | $0.10 / $0.20 | ||
| 16 | Moonshot AI | 1.0T | 262K | $0.75 / $3.50 | ||
| 17 | OpenAI | — | 400K | $0.75 / $4.50 | ||
| 18 | 550B | — | — | |||
| 19 | MiniMax | — | 205K | $0.30 / $1.20 | ||
| 20 | Meta | — | — | — | ||
| 21 | Alibaba Cloud / Qwen Team | — | 1.0M | $0.50 / $3.00 | ||
| 22 | Alibaba Cloud / Qwen Team | 28B | 262K | $0.60 / $3.60 | ||
| 23 | OpenAI | — | 1.1M | $5.00 / $30.00 | ||
| 24 | OpenAI | — | 400K | $0.20 / $1.25 | ||
| 25 | xAI | — | 1.0M | $1.25 / $2.50 | ||
| 26 | Alibaba Cloud / Qwen Team | 35B | — | — | ||
| 27 | Alibaba Cloud / Qwen Team | 122B | — | — | ||
| 28 | Google | — | 1.0M | $2.50 / $15.00 | ||
| 29 | Alibaba Cloud / Qwen Team | 397B | — | — | ||
| 30 | Alibaba Cloud / Qwen Team | — | 1.0M | $0.32 / $1.28 | ||
| 31 | Mistral AI | 128B | 256K | $1.50 / $7.50 | ||
| 32 | Anthropic | — | 200K | $1.00 / $5.00 | ||
| 33 | Google | 31B | 262K | $0.13 / $0.38 |
Sub-benchmarks
What is GDPval-AA?
GDPval-AA is an evaluation of AI model performance on economically valuable knowledge work tasks across professional domains including finance, legal, and other sectors. Run independently by Artificial Analysis, it uses Elo scoring to rank models on real-world work task performance.
GDPval-AA is a text benchmark evaluating models on legal, reasoning, finance, general, and agents tasks. LLM Stats tracks 33 models on this benchmark, scored on a 0–3000 scale. The current average is 1230.7, with the leader at 1815.0.
Compare leaders on the best AI for legal, best AI for reasoning, best AI for finance, best AI for general and best AI for agents leaderboards.
Current leaders
Claude Fable 5 from Anthropic currently leads the GDPval-AA leaderboard with a score of 1815.000 across 33 evaluated AI models.
FAQ
Common questions about the GDPval-AA benchmark and leaderboard.