AIME 2024
American Invitational Mathematics Examination 2024, consisting of 30 challenging mathematical reasoning problems from AIME I and AIME II competitions. Each problem requires an integer answer between 0-999 and tests advanced mathematical reasoning across algebra, geometry, combinatorics, and number theory. Used as a benchmark for evaluating mathematical reasoning capabilities in large language models at Olympiad-level difficulty.
Progress Over Time
Interactive timeline showing model performance evolution on AIME 2024
State-of-the-art frontier
Open
Proprietary
AIME 2024 Leaderboard
52 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | xAI | — | 128K | $0.30 $0.50 | ||
2 | OpenAI | — | 200K | $1.10 $4.40 | ||
3 | xAI | — | 128K | $3.00 $15.00 | ||
3 | Meituan | 560B | 128K | $0.30 $1.20 | ||
5 | Google | — | 1.0M | $1.25 $10.00 | ||
6 | OpenAI | — | 200K | $2.00 $8.00 | ||
7 | DeepSeek | 671B | 131K | $0.50 $2.15 | ||
8 | Zhipu AI | 355B | 131K | $0.40 $1.60 | ||
9 | Mistral AI | 14B | 262K | $0.20 $0.20 | ||
10 | Zhipu AI | 106B | — | — | ||
11 | Google | — | 1.0M | $0.30 $2.50 | ||
12 | OpenAI | — | 200K | $1.10 $4.40 | ||
13 | DeepSeek | 671B | — | — | ||
13 | DeepSeek | 71B | 128K | $0.10 $0.40 | ||
15 | Mistral AI | 8B | 262K | $0.15 $0.15 | ||
15 | MiniMax | 456B | 1.0M | $0.55 $2.20 | ||
15 | OpenAI | — | — | — | ||
18 | Alibaba Cloud / Qwen Team | 235B | 128K | $0.10 $0.10 | ||
19 | MiniMax | 456B | — | — | ||
19 | DeepSeek | 8B | — | — | ||
19 | DeepSeek | 33B | 128K | $0.12 $0.18 | ||
22 | Alibaba Cloud / Qwen Team | 33B | 128K | $0.10 $0.30 | ||
23 | Microsoft | 14B | — | — | ||
24 | 8B | 128K | $0.50 $0.50 | |||
24 | 8B | — | — | |||
26 | Alibaba Cloud / Qwen Team | 31B | 128K | $0.10 $0.30 | ||
27 | DeepSeek | 8B | — | — | ||
27 | DeepSeek | 15B | — | — | ||
27 | Anthropic | — | 200K | $3.00 $15.00 | ||
30 | Alibaba Cloud / Qwen Team | 33B | — | — | ||
31 | Moonshot AI | — | — | — | ||
31 | Mistral AI | 3B | 131K | $0.10 $0.10 | ||
33 | Microsoft | 14B | — | — | ||
34 | OpenAI | — | 200K | $15.00 $60.00 | ||
35 | Mistral AI | 24B | — | — | ||
36 | — | — | — | |||
37 | Meituan | 69B | 256K | $0.10 $0.40 | ||
38 | Moonshot AI | 1.0T | 262K | $0.60 $2.50 | ||
39 | Mistral AI | 24B | — | — | ||
40 | Moonshot AI | 1.0T | 200K | $0.50 $0.50 | ||
40 | Moonshot AI | 1.0T | — | — | ||
42 | DeepSeek | 671B | 164K | $0.27 $1.00 | ||
43 | DeepSeek | 671B | 164K | $0.28 $1.14 | ||
44 | DeepSeek | 2B | — | — | ||
45 | Alibaba Cloud / Qwen Team | 33B | 33K | $0.15 $0.60 | ||
46 | OpenAI | — | 1.0M | $0.40 $1.60 | ||
47 | OpenAI | — | 1.0M | $2.00 $8.00 | ||
48 | OpenAI | — | 128K | $15.00 $60.00 | ||
49 | DeepSeek | 671B | 131K | $0.27 $1.10 | ||
50 | OpenAI | — | 128K | $75.00 $150.00 |
Showing 1-50 of 52
1 / 2
Notice missing or incorrect data?
FAQ
Common questions about AIME 2024
American Invitational Mathematics Examination 2024, consisting of 30 challenging mathematical reasoning problems from AIME I and AIME II competitions. Each problem requires an integer answer between 0-999 and tests advanced mathematical reasoning across algebra, geometry, combinatorics, and number theory. Used as a benchmark for evaluating mathematical reasoning capabilities in large language models at Olympiad-level difficulty.
The AIME 2024 paper is available at https://arxiv.org/html/2503.21380v2. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The AIME 2024 leaderboard ranks 52 AI models based on their performance on this benchmark. Currently, Grok-3 Mini by xAI leads with a score of 0.958. The average score across all models is 0.746.
The highest AIME 2024 score is 0.958, achieved by Grok-3 Mini from xAI.
52 models have been evaluated on the AIME 2024 benchmark, with 0 verified results and 52 self-reported results.
AIME 2024 is categorized under math and reasoning. The benchmark evaluates text models.