ArXivMath
Progress Over Time
Interactive timeline showing model performance evolution on ArXivMath
ArXivMath Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Anthropic | — | 1.0M | $3.00 / $15.00 |
What is ArXivMath?
ArXivMath is a final-answer benchmark of research-level mathematics maintained by MathArena. Problems are extracted monthly from recent arXiv paper abstracts, then filtered through automated and manual checks to ensure they are self-contained, non-trivial, and verifiable. Because problems are drawn from active research, the benchmark is more realistic and more closely connected to mathematical research than contest or olympiad benchmarks.
ArXivMath is a text benchmark evaluating models on math and reasoning tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.7.
Compare leaders on the best AI for math and best AI for reasoning leaderboards.
Current leaders
Claude Sonnet 5 from Anthropic currently leads the ArXivMath leaderboard with a score of 0.722 across 1 evaluated AI models.
FAQ
Common questions about the ArXivMath benchmark and leaderboard.