Benchmarks/math/AIME 2025

AIME 2025

All 30 problems from the 2025 American Invitational Mathematics Examination (AIME I and AIME II), testing olympiad-level mathematical reasoning with integer answers from 000-999. Used as an AI benchmark to evaluate large language models' ability to solve complex mathematical problems requiring multi-step logical deductions and structured symbolic reasoning.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on AIME 2025

State-of-the-art frontier
Open
Proprietary

AIME 2025 Leaderboard

105 models • 0 verified
ContextCostLicense
1
OpenAI
OpenAI
1.000400K
$1.75
$14.00
1
1.000
1
1.0001.0T
1
1.000
1
1.000400K
$21.00
$168.00
6
0.998200K
$5.00
$25.00
7
0.9971.0M
$0.50
$3.00
8
0.996560B128K
$0.30
$1.20
8
0.996400K
$1.25
$10.00
10
0.99232B262K
$0.06
$0.24
11
0.98721B
12
0.984400K
$1.25
$10.00
13
ByteDance
ByteDance
0.983
14
0.973196B66K
$0.10
$0.40
15
Sarvam AI
Sarvam AI
0.96730B
15
Sarvam AI
Sarvam AI
0.967105B
15
0.967400K
$1.25
$10.00
18
Moonshot AI
Moonshot AI
0.9611.0T262K
$0.60
$2.50
19
0.960685B
20
Zhipu AI
Zhipu AI
0.957358B205K
$0.60
$2.20
21
OpenAI
OpenAI
0.946400K
$1.25
$10.00
21
0.946400K
$1.25
$10.00
23
0.941309B256K
$0.10
$0.30
24
0.940400K
$1.25
$10.00
24
0.940400K
$1.25
$10.00
24
OpenAI
OpenAI
0.940400K
$1.25
$10.00
27
Zhipu AI
Zhipu AI
0.939357B131K
$0.55
$2.19
28
0.933128K
$3.00
$15.00
29
0.931685B
30
ByteDance
ByteDance
0.930
31
LG AI Research
LG AI Research
0.928236B33K
$0.60
$1.00
32
OpenAI
OpenAI
0.927200K
$1.10
$4.40
33
0.925117B131K
$0.10
$0.50
34
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.923235B262K
$0.30
$3.00
35
0.9202.0M
$0.20
$0.50
36
0.917
37
0.91630B128K
$0.07
$0.40
38
0.911400K
$0.25
$2.00
38
Inception
Inception
0.911128K
$0.25
$0.75
40
0.908128K
$0.30
$0.50
41
0.906560B128K
$0.30
$1.20
42
0.902120B262K
$0.10
$0.50
43
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.897236B262K
$0.45
$3.49
44
0.893685B
45
0.889400K
$1.25
$10.00
46
0.8801.0M
$1.25
$10.00
47
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.87880B66K
$0.15
$1.50
48
0.87710B
49
0.875671B131K
$0.50
$2.15
50
0.870
Showing 1-50 of 105
1 / 3
Notice missing or incorrect data?Start an Issue discussion

FAQ

Common questions about AIME 2025

All 30 problems from the 2025 American Invitational Mathematics Examination (AIME I and AIME II), testing olympiad-level mathematical reasoning with integer answers from 000-999. Used as an AI benchmark to evaluate large language models' ability to solve complex mathematical problems requiring multi-step logical deductions and structured symbolic reasoning.
The AIME 2025 paper is available at https://arxiv.org/abs/2503.21380. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The AIME 2025 leaderboard ranks 105 AI models based on their performance on this benchmark. Currently, GPT-5.2 by OpenAI leads with a score of 1.000. The average score across all models is 0.783.
The highest AIME 2025 score is 1.000, achieved by GPT-5.2 from OpenAI.
105 models have been evaluated on the AIME 2025 benchmark, with 0 verified results and 105 self-reported results.
AIME 2025 is categorized under math and reasoning. The benchmark evaluates text models.