AIME 2026
All 30 problems from the 2026 American Invitational Mathematics Examination (AIME I and AIME II), testing olympiad-level mathematical reasoning with integer answers from 000-999. Used as an AI benchmark to evaluate large language models' ability to solve complex mathematical problems requiring multi-step logical deductions and structured symbolic reasoning.
Progress Over Time
Interactive timeline showing model performance evolution on AIME 2026
State-of-the-art frontier
Open
Proprietary
AIME 2026 Leaderboard
8 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Qwen3.6 PlusNew Alibaba Cloud / Qwen Team | — | — | — | ||
| 2 | ByteDance | — | — | — | ||
| 3 | Alibaba Cloud / Qwen Team | 397B | 262K | $0.60 / $3.60 | ||
| 4 | Gemma 4 31BNew Google | 31B | — | — | ||
| 5 | Google | 25B | — | — | ||
| 5 | ByteDance | — | — | — | ||
| 7 | Gemma 4 E4BNew Google | 8B | — | — | ||
| 8 | Gemma 4 E2BNew Google | 5B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about AIME 2026
All 30 problems from the 2026 American Invitational Mathematics Examination (AIME I and AIME II), testing olympiad-level mathematical reasoning with integer answers from 000-999. Used as an AI benchmark to evaluate large language models' ability to solve complex mathematical problems requiring multi-step logical deductions and structured symbolic reasoning.
The AIME 2026 leaderboard ranks 8 AI models based on their performance on this benchmark. Currently, Qwen3.6 Plus by Alibaba Cloud / Qwen Team leads with a score of 0.953. The average score across all models is 0.783.
The highest AIME 2026 score is 0.953, achieved by Qwen3.6 Plus from Alibaba Cloud / Qwen Team.
8 models have been evaluated on the AIME 2026 benchmark, with 0 verified results and 8 self-reported results.
AIME 2026 is categorized under math and reasoning. The benchmark evaluates text models.