MATH-500
MATH-500 is a subset of the MATH dataset containing 500 challenging competition mathematics problems from AMC 10, AMC 12, AIME, and other mathematics competitions. Each problem includes full step-by-step solutions and spans multiple difficulty levels across seven mathematical subjects including Prealgebra, Algebra, Number Theory, Counting and Probability, Geometry, Intermediate Algebra, and Precalculus.
Progress Over Time
Interactive timeline showing model performance evolution on MATH-500
State-of-the-art frontier
Open
Proprietary
MATH-500 Leaderboard
32 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | Meituan | 560B | 128K | $0.30 $1.20 | ||
2 | Sarvam AI | 105B | — | — | ||
3 | Zhipu AI | 355B | 131K | $0.40 $1.60 | ||
4 | Zhipu AI | 106B | — | — | ||
5 | NVIDIA | 9B | — | — | ||
6 | Moonshot AI | 1.0T | — | — | ||
6 | Moonshot AI | 1.0T | 200K | $0.50 $0.50 | ||
8 | Sarvam AI | 30B | — | — | ||
8 | 253B | — | — | |||
10 | MiniMax | 456B | 1.0M | $0.55 $2.20 | ||
10 | Meituan | 69B | 256K | $0.10 $0.40 | ||
12 | 50B | — | — | |||
13 | Meituan | 560B | 128K | $0.30 $1.20 | ||
14 | Anthropic | — | 200K | $3.00 $15.00 | ||
14 | Moonshot AI | — | — | — | ||
16 | MiniMax | 456B | — | — | ||
17 | DeepSeek | 671B | — | — | ||
18 | 8B | — | — | |||
19 | Microsoft | 4B | — | — | ||
20 | DeepSeek | 71B | 128K | $0.10 $0.40 | ||
21 | DeepSeek | 33B | 128K | $0.12 $0.18 | ||
22 | DeepSeek | 671B | 164K | $0.28 $1.14 | ||
23 | DeepSeek | 15B | — | — | ||
24 | DeepSeek | 8B | — | — | ||
25 | Alibaba Cloud / Qwen Team | 33B | — | — | ||
25 | Alibaba Cloud / Qwen Team | 33B | 33K | $0.15 $0.60 | ||
27 | DeepSeek | 671B | 131K | $0.27 $1.10 | ||
28 | OpenAI | — | 128K | $3.00 $12.00 | ||
29 | DeepSeek | 8B | — | — | ||
30 | DeepSeek | 2B | — | — | ||
31 | 8B | 128K | $0.50 $0.50 | |||
31 | 8B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about MATH-500
MATH-500 is a subset of the MATH dataset containing 500 challenging competition mathematics problems from AMC 10, AMC 12, AIME, and other mathematics competitions. Each problem includes full step-by-step solutions and spans multiple difficulty levels across seven mathematical subjects including Prealgebra, Algebra, Number Theory, Counting and Probability, Geometry, Intermediate Algebra, and Precalculus.
The MATH-500 paper is available at https://arxiv.org/abs/2103.03874. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MATH-500 leaderboard ranks 32 AI models based on their performance on this benchmark. Currently, LongCat-Flash-Thinking by Meituan leads with a score of 0.992. The average score across all models is 0.932.
The highest MATH-500 score is 0.992, achieved by LongCat-Flash-Thinking from Meituan.
32 models have been evaluated on the MATH-500 benchmark, with 0 verified results and 32 self-reported results.
MATH-500 is categorized under math and reasoning. The benchmark evaluates text models.