AIME 2026
Progress Over Time
Interactive timeline showing model performance evolution on AIME 2026
AIME 2026 Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Zhipu AI | 753B | 1.0M | $0.95 / $3.00 | ||
| 2 | Moonshot AI | 1.0T | 262K | $0.75 / $3.50 | ||
| 3 | Zhipu AI | 754B | 200K | $1.40 / $4.40 | ||
| 3 | Alibaba Cloud / Qwen Team | — | 1.0M | $0.50 / $3.00 | ||
| 5 | Microsoft | 1.0T | — | — | ||
| 6 | ByteDance | — | 256K | $0.50 / $3.00 | ||
| 7 | Alibaba Cloud / Qwen Team | 28B | 262K | $0.60 / $3.60 | ||
| 8 | Alibaba Cloud / Qwen Team | 35B | — | — | ||
| 9 | Microsoft | — | — | — | ||
| 10 | Alibaba Cloud / Qwen Team | 397B | — | — | ||
| 11 | Google | 31B | 262K | $0.13 / $0.38 | ||
| 12 | Google | 25B | 262K | $0.13 / $0.40 | ||
| 12 | ByteDance | — | — | — | ||
| 14 | Google | 12B | — | — | ||
| 15 | Google | 25B | — | — | ||
| 16 | Google | 8B | — | — | ||
| 17 | Google | 5B | — | — |
What is AIME 2026?
All 30 problems from the 2026 American Invitational Mathematics Examination (AIME I and AIME II), testing olympiad-level mathematical reasoning with integer answers from 000-999. Used as an AI benchmark to evaluate large language models' ability to solve complex mathematical problems requiring multi-step logical deductions and structured symbolic reasoning.
AIME 2026 is a text benchmark evaluating models on math and reasoning tasks. LLM Stats tracks 17 models on this benchmark, scored on a 0–1 scale. The current average is 0.8, with the leader at 1.0.
Compare leaders on the best AI for math and best AI for reasoning leaderboards.
Current leaders
GLM-5.2 from Zhipu AI currently leads the AIME 2026 leaderboard with a score of 0.992 across 17 evaluated AI models.
FAQ
Common questions about the AIME 2026 benchmark and leaderboard.