GeneBench Leaderboard

Progress Over Time

Interactive timeline showing model performance evolution on GeneBench

State-of-the-art frontier

Open

Proprietary

GeneBench Leaderboard

2 models

				Context	Cost	License
1	GPT-5.5 ProNew OpenAI		—	1.0M	$30.00 / $180.00
2	GPT-5.5New OpenAI		—	1.0M	$5.00 / $30.00

FAQ

Common questions about GeneBench

GeneBench is an evaluation focused on multi-stage scientific data analysis in genetics and quantitative biology. Tasks require reasoning about ambiguous or noisy data with minimal supervisory guidance, addressing realistic obstacles such as hidden confounders or QC failures, and correctly implementing and interpreting modern statistical methods.

The GeneBench paper is available at https://cdn.openai.com/pdf/6dc7175d-d9e7-4b8d-96b8-48fe5798cd5b/oai_genebench_benchmark.pdf. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.

The GeneBench leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, GPT-5.5 Pro by OpenAI leads with a score of 0.332. The average score across all models is 0.291.

The highest GeneBench score is 0.332, achieved by GPT-5.5 Pro from OpenAI.

2 models have been evaluated on the GeneBench benchmark, with 0 verified results and 2 self-reported results.

GeneBench is categorized under agents, reasoning, and science. The benchmark evaluates text models.

GeneBench

Progress Over Time

GeneBench Leaderboard

FAQ

What is the GeneBench benchmark?

Where can I find the GeneBench paper?

What is the GeneBench leaderboard?

What is the highest GeneBench score?

How many models are evaluated on GeneBench?

What categories does GeneBench cover?