Benchmarks/math/FrontierMath

FrontierMath

A benchmark of hundreds of original, exceptionally challenging mathematics problems crafted and vetted by expert mathematicians, covering most major branches of modern mathematics from number theory and real analysis to algebraic geometry and category theory.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on FrontierMath

State-of-the-art frontier
Open
Proprietary

FrontierMath Leaderboard

11 models • 0 verified
ContextCostLicense
1
OpenAI
OpenAI
0.4761.0M
$2.50
$15.00
2
OpenAI
OpenAI
0.403400K
$1.75
$14.00
3
0.267400K
$1.25
$10.00
3
OpenAI
OpenAI
0.267400K
$1.25
$10.00
3
0.267400K
$1.25
$10.00
6
OpenAI
OpenAI
0.263400K
$1.25
$10.00
7
0.221400K
$0.25
$2.00
8
OpenAI
OpenAI
0.158200K
$2.00
$8.00
9
0.096400K
$0.05
$0.40
10
OpenAI
OpenAI
0.092200K
$1.10
$4.40
11
OpenAI
OpenAI
0.055200K
$15.00
$60.00
Notice missing or incorrect data?Start an Issue discussion

FAQ

Common questions about FrontierMath

A benchmark of hundreds of original, exceptionally challenging mathematics problems crafted and vetted by expert mathematicians, covering most major branches of modern mathematics from number theory and real analysis to algebraic geometry and category theory.
The FrontierMath paper is available at https://arxiv.org/abs/2411.04872. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The FrontierMath leaderboard ranks 11 AI models based on their performance on this benchmark. Currently, GPT-5.4 by OpenAI leads with a score of 0.476. The average score across all models is 0.233.
The highest FrontierMath score is 0.476, achieved by GPT-5.4 from OpenAI.
11 models have been evaluated on the FrontierMath benchmark, with 0 verified results and 11 self-reported results.
FrontierMath is categorized under math and reasoning. The benchmark evaluates text models.