SAT Math
SAT Math benchmark from AGIEval containing standardized mathematics questions from the College Board SAT examination, designed to evaluate mathematical reasoning capabilities of foundation models using human-centric assessment methods.
Progress Over Time
Interactive timeline showing model performance evolution on SAT Math
State-of-the-art frontier
Open
Proprietary
SAT Math Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | OpenAI | — | 33K | $30.00 / $60.00 |
Notice missing or incorrect data?
FAQ
Common questions about SAT Math
SAT Math benchmark from AGIEval containing standardized mathematics questions from the College Board SAT examination, designed to evaluate mathematical reasoning capabilities of foundation models using human-centric assessment methods.
The SAT Math paper is available at https://arxiv.org/abs/2304.06364. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The SAT Math leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, GPT-4 by OpenAI leads with a score of 0.890. The average score across all models is 0.890.
The highest SAT Math score is 0.890, achieved by GPT-4 from OpenAI.
1 models have been evaluated on the SAT Math benchmark, with 0 verified results and 1 self-reported results.
SAT Math is categorized under math and reasoning. The benchmark evaluates text models.