GDPval-AA
GDPval-AA is an evaluation of AI model performance on economically valuable knowledge work tasks across professional domains including finance, legal, and other sectors. Run independently by Artificial Analysis, it uses Elo scoring to rank models on real-world work task performance.
Progress Over Time
Interactive timeline showing model performance evolution on GDPval-AA
State-of-the-art frontier
Open
Proprietary
GDPval-AA Leaderboard
3 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | Anthropic | — | 200K | $3.00 $15.00 | ||
2 | Anthropic | — | 1.0M | $5.00 $25.00 | ||
3 | Google | — | 1.0M | $2.50 $15.00 |
Notice missing or incorrect data?
FAQ
Common questions about GDPval-AA
GDPval-AA is an evaluation of AI model performance on economically valuable knowledge work tasks across professional domains including finance, legal, and other sectors. Run independently by Artificial Analysis, it uses Elo scoring to rank models on real-world work task performance.
The GDPval-AA leaderboard ranks 3 AI models based on their performance on this benchmark. Currently, Claude Sonnet 4.6 by Anthropic leads with a score of 1633.000. The average score across all models is 1518.667.
The highest GDPval-AA score is 1633.000, achieved by Claude Sonnet 4.6 from Anthropic.
3 models have been evaluated on the GDPval-AA benchmark, with 0 verified results and 3 self-reported results.
GDPval-AA is categorized under agents, general, and reasoning. The benchmark evaluates text models.