Benchmarks/chemistry/GPQA Chemistry

GPQA Chemistry

Chemistry subset of GPQA, containing challenging multiple-choice questions written by domain experts in chemistry. These Google-proof questions require graduate-level knowledge and reasoning.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on GPQA Chemistry

State-of-the-art frontier
Open
Proprietary

GPQA Chemistry Leaderboard

1 models • 0 verified
ContextCostLicense
1
OpenAI
OpenAI
200K
$15.00
$60.00
Notice missing or incorrect data?

FAQ

Common questions about GPQA Chemistry

Chemistry subset of GPQA, containing challenging multiple-choice questions written by domain experts in chemistry. These Google-proof questions require graduate-level knowledge and reasoning.
The GPQA Chemistry paper is available at https://arxiv.org/abs/2311.12022. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The GPQA Chemistry leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, o1 by OpenAI leads with a score of 0.647. The average score across all models is 0.647.
The highest GPQA Chemistry score is 0.647, achieved by o1 from OpenAI.
1 models have been evaluated on the GPQA Chemistry benchmark, with 0 verified results and 1 self-reported results.
GPQA Chemistry is categorized under chemistry and reasoning. The benchmark evaluates text models.