AIME

Paper

Progress Over Time

Interactive timeline showing model performance evolution on AIME

State-of-the-art frontier
Open
Proprietary

AIME Leaderboard

2 models
ContextCostLicense
14B
21.0T1.0M$0.43 / $0.87
Notice missing or incorrect data?
About this benchmark

What is AIME?

American Invitational Mathematics Examination (AIME) benchmark for evaluating mathematical reasoning capabilities of large language models. Contains 30 challenging mathematical problems from AIME 2024 competition that require multi-step reasoning and advanced mathematical insight. Each problem has an integer answer between 000-999.

AIME is a text benchmark evaluating models on math and reasoning tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.5, with the leader at 0.6.

Compare leaders on the best AI for math and best AI for reasoning leaderboards.

Current leaders

Phi 4 Mini Reasoning from Microsoft currently leads the AIME leaderboard with a score of 0.575 across 2 evaluated AI models.

1Phi 4 Mini ReasoningMicrosoft57.5%
2MiMo-V2.5-ProXiaomi37.3%

FAQ

Common questions about the AIME benchmark and leaderboard.

What is the AIME benchmark?

American Invitational Mathematics Examination (AIME) benchmark for evaluating mathematical reasoning capabilities of large language models. Contains 30 challenging mathematical problems from AIME 2024 competition that require multi-step reasoning and advanced mathematical insight. Each problem has an integer answer between 000-999.

What is the AIME leaderboard?

The AIME leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Phi 4 Mini Reasoning by Microsoft leads with a score of 0.575. The average score across all models is 0.474.

What is the highest AIME score?

The highest AIME score is 0.575, achieved by Phi 4 Mini Reasoning from Microsoft.

How many models are evaluated on AIME?

2 models have been evaluated on the AIME benchmark, with 0 verified results and 2 self-reported results.

Where can I find the AIME paper?

The AIME paper is available at https://arxiv.org/html/2503.21380v2. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does AIME cover?

AIME is categorized under math and reasoning. The benchmark evaluates text models.

What is the best open-source model on AIME?

Phi 4 Mini Reasoning by Microsoft is the top-ranked open-source model on AIME, with a score of 0.575 (rank #1).

How recent are the AIME leaderboard results?

The AIME leaderboard was last updated in July 2026 and currently includes 2 evaluated models.