MathArena Apex

Progress Over Time

Interactive timeline showing model performance evolution on MathArena Apex

State-of-the-art frontier
Open
Proprietary

MathArena Apex Leaderboard

6 models
ContextCostLicense
11.6T1.0M$1.60 / $3.20
2284B1.0M$0.10 / $0.20
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$1.25 / $3.75
4
ByteDance
ByteDance
5
ByteDance
ByteDance
6
Notice missing or incorrect data?
About this benchmark

What is MathArena Apex?

MathArena Apex is a challenging math contest benchmark featuring the most difficult mathematical problems designed to test advanced reasoning and problem-solving abilities of AI models. It focuses on olympiad-level mathematics and complex multi-step mathematical reasoning.

MathArena Apex is a text benchmark evaluating models on math and reasoning tasks. LLM Stats tracks 6 models on this benchmark, scored on a 0–1 scale. The current average is 0.5, with the leader at 0.9.

Compare leaders on the best AI for math and best AI for reasoning leaderboards.

Current leaders

DeepSeek-V4-Pro-Max from DeepSeek currently leads the MathArena Apex leaderboard with a score of 0.902 across 6 evaluated AI models.

1DeepSeek-V4-Pro-MaxDeepSeek90.2%
2DeepSeek-V4-Flash-MaxDeepSeek85.7%
3Qwen3.7 MaxAlibaba Cloud / Qwen Team44.5%

FAQ

Common questions about the MathArena Apex benchmark and leaderboard.

What is the MathArena Apex benchmark?

MathArena Apex is a challenging math contest benchmark featuring the most difficult mathematical problems designed to test advanced reasoning and problem-solving abilities of AI models. It focuses on olympiad-level mathematics and complex multi-step mathematical reasoning.

What is the MathArena Apex leaderboard?

The MathArena Apex leaderboard ranks 6 AI models based on their performance on this benchmark. Currently, DeepSeek-V4-Pro-Max by DeepSeek leads with a score of 0.902. The average score across all models is 0.517.

What is the highest MathArena Apex score?

The highest MathArena Apex score is 0.902, achieved by DeepSeek-V4-Pro-Max from DeepSeek.

How many models are evaluated on MathArena Apex?

6 models have been evaluated on the MathArena Apex benchmark, with 0 verified results and 6 self-reported results.

What categories does MathArena Apex cover?

MathArena Apex is categorized under math and reasoning. The benchmark evaluates text models.

What is the best open-source model on MathArena Apex?

DeepSeek-V4-Pro-Max by DeepSeek is the top-ranked open-source model on MathArena Apex, with a score of 0.902 (rank #1).

Which model offers the best value on MathArena Apex?

Among models scoring within 10% of the leader, DeepSeek-V4-Flash-Max from DeepSeek is the cheapest, at $0.10 per million input tokens with a score of 0.857.

How recent are the MathArena Apex leaderboard results?

The MathArena Apex leaderboard was last updated in June 2026 and currently includes 6 evaluated models.