Aider-Polyglot
Progress Over Time
Interactive timeline showing model performance evolution on Aider-Polyglot
Aider-Polyglot Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | OpenAI | — | — | — | ||
| 2 | — | — | — | |||
| 3 | OpenAI | — | — | — | ||
| 4 | Google | — | 1.0M | $1.25 / $10.00 | ||
| 5 | DeepSeek | 685B | — | — | ||
| 6 | DeepSeek | 671B | 131K | $0.55 / $2.19 | ||
| 7 | OpenAI | — | — | — | ||
| 8 | DeepSeek | 671B | — | — | ||
| 9 | OpenAI | — | — | — | ||
| 10 | Google | — | 1.0M | $0.30 / $2.50 | ||
| 11 | Alibaba Cloud / Qwen Team | 480B | — | — | ||
| 12 | Moonshot AI | 1.0T | — | — | ||
| 12 | Moonshot AI | 1.0T | — | — | ||
| 14 | Alibaba Cloud / Qwen Team | 235B | — | — | ||
| 15 | OpenAI | — | 1.0M | $2.00 / $8.00 | ||
| 16 | Alibaba Cloud / Qwen Team | 80B | — | — | ||
| 17 | DeepSeek | 671B | — | — | ||
| 18 | Mistral AI | 24B | — | — | ||
| 19 | OpenAI | — | 1.0M | $0.40 / $1.60 | ||
| 20 | OpenAI | — | 128K | $2.50 / $10.00 | ||
| 21 | Google | — | — | — | ||
| 22 | OpenAI | — | 1.0M | $0.10 / $0.40 |
What is Aider-Polyglot?
A coding benchmark that evaluates LLMs on 225 challenging Exercism programming exercises across C++, Go, Java, JavaScript, Python, and Rust. Models receive two attempts to solve each problem, with test error feedback provided after the first attempt if it fails. The benchmark measures both initial problem-solving ability and capacity to edit code based on error feedback, providing an end-to-end evaluation of code generation and editing capabilities across multiple programming languages.
Aider-Polyglot is a text benchmark evaluating models on general and code tasks. LLM Stats tracks 22 models on this benchmark, scored on a 0–1 scale. The current average is 0.6, with the leader at 0.9.
Compare leaders on the best AI for general and best AI for code leaderboards.
Current leaders
GPT-5 from OpenAI currently leads the Aider-Polyglot leaderboard with a score of 0.880 across 22 evaluated AI models.
FAQ
Common questions about the Aider-Polyglot benchmark and leaderboard.