Benchmarks/math/AIME 2026

AIME 2026

All 30 problems from the 2026 American Invitational Mathematics Examination (AIME I and AIME II), testing olympiad-level mathematical reasoning with integer answers from 000-999. Used as an AI benchmark to evaluate large language models' ability to solve complex mathematical problems requiring multi-step logical deductions and structured symbolic reasoning.

Progress Over Time

Interactive timeline showing model performance evolution on AIME 2026

State-of-the-art frontier
Open
Proprietary

AIME 2026 Leaderboard

8 models
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
2
ByteDance
ByteDance
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
397B262K$0.60 / $3.60
4
Google
Google
31B
525B
5
ByteDance
ByteDance
7
Google
Google
8B
8
Google
Google
5B
Notice missing or incorrect data?

FAQ

Common questions about AIME 2026

All 30 problems from the 2026 American Invitational Mathematics Examination (AIME I and AIME II), testing olympiad-level mathematical reasoning with integer answers from 000-999. Used as an AI benchmark to evaluate large language models' ability to solve complex mathematical problems requiring multi-step logical deductions and structured symbolic reasoning.
The AIME 2026 leaderboard ranks 8 AI models based on their performance on this benchmark. Currently, Qwen3.6 Plus by Alibaba Cloud / Qwen Team leads with a score of 0.953. The average score across all models is 0.783.
The highest AIME 2026 score is 0.953, achieved by Qwen3.6 Plus from Alibaba Cloud / Qwen Team.
8 models have been evaluated on the AIME 2026 benchmark, with 0 verified results and 8 self-reported results.
AIME 2026 is categorized under math and reasoning. The benchmark evaluates text models.