BixBench Leaderboard

Progress Over Time

Interactive timeline showing model performance evolution on BixBench

State-of-the-art frontier

Open

Proprietary

BixBench Leaderboard

1 models

				Context	Cost	License
1	GPT-5.5New OpenAI		—	1.0M	$5.00 / $30.00

FAQ

Common questions about BixBench

BixBench is a benchmark for real-world bioinformatics and computational biology data analysis. It evaluates AI models on multi-step scientific workflows that require code execution, statistical reasoning, and biological domain knowledge to interpret experimental data.

The BixBench paper is available at https://arxiv.org/abs/2503.00096. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.

The BixBench leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, GPT-5.5 by OpenAI leads with a score of 0.805. The average score across all models is 0.805.

The highest BixBench score is 0.805, achieved by GPT-5.5 from OpenAI.

1 models have been evaluated on the BixBench benchmark, with 0 verified results and 1 self-reported results.

BixBench is categorized under agents, reasoning, and science. The benchmark evaluates text models.

BixBench

Progress Over Time

BixBench Leaderboard

FAQ

What is the BixBench benchmark?

Where can I find the BixBench paper?

What is the BixBench leaderboard?

What is the highest BixBench score?

How many models are evaluated on BixBench?

What categories does BixBench cover?