AIME
Progress Over Time
Interactive timeline showing model performance evolution on AIME
AIME Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Microsoft | 4B | — | — | ||
| 2 | Xiaomi | 1.0T | 1.0M | $0.43 / $0.87 |
What is AIME?
American Invitational Mathematics Examination (AIME) benchmark for evaluating mathematical reasoning capabilities of large language models. Contains 30 challenging mathematical problems from AIME 2024 competition that require multi-step reasoning and advanced mathematical insight. Each problem has an integer answer between 000-999.
AIME is a text benchmark evaluating models on math and reasoning tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.5, with the leader at 0.6.
Compare leaders on the best AI for math and best AI for reasoning leaderboards.
Current leaders
Phi 4 Mini Reasoning from Microsoft currently leads the AIME leaderboard with a score of 0.575 across 2 evaluated AI models.
FAQ
Common questions about the AIME benchmark and leaderboard.