GeneBench
GeneBench is an evaluation focused on multi-stage scientific data analysis in genetics and quantitative biology. Tasks require reasoning about ambiguous or noisy data with minimal supervisory guidance, addressing realistic obstacles such as hidden confounders or QC failures, and correctly implementing and interpreting modern statistical methods.
Progress Over Time
Interactive timeline showing model performance evolution on GeneBench
State-of-the-art frontier
Open
Proprietary
GeneBench Leaderboard
2 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | GPT-5.5 ProNew OpenAI | — | 1.0M | $30.00 / $180.00 | ||
| 2 | GPT-5.5New OpenAI | — | 1.0M | $5.00 / $30.00 |
Notice missing or incorrect data?
FAQ
Common questions about GeneBench
GeneBench is an evaluation focused on multi-stage scientific data analysis in genetics and quantitative biology. Tasks require reasoning about ambiguous or noisy data with minimal supervisory guidance, addressing realistic obstacles such as hidden confounders or QC failures, and correctly implementing and interpreting modern statistical methods.
The GeneBench paper is available at https://cdn.openai.com/pdf/6dc7175d-d9e7-4b8d-96b8-48fe5798cd5b/oai_genebench_benchmark.pdf. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The GeneBench leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, GPT-5.5 Pro by OpenAI leads with a score of 0.332. The average score across all models is 0.291.
The highest GeneBench score is 0.332, achieved by GPT-5.5 Pro from OpenAI.
2 models have been evaluated on the GeneBench benchmark, with 0 verified results and 2 self-reported results.
GeneBench is categorized under agents, reasoning, and science. The benchmark evaluates text models.