BixBench
BixBench is a benchmark for real-world bioinformatics and computational biology data analysis. It evaluates AI models on multi-step scientific workflows that require code execution, statistical reasoning, and biological domain knowledge to interpret experimental data.
Progress Over Time
Interactive timeline showing model performance evolution on BixBench
State-of-the-art frontier
Open
Proprietary
BixBench Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | GPT-5.5New OpenAI | — | 1.0M | $5.00 / $30.00 |
Notice missing or incorrect data?
FAQ
Common questions about BixBench
BixBench is a benchmark for real-world bioinformatics and computational biology data analysis. It evaluates AI models on multi-step scientific workflows that require code execution, statistical reasoning, and biological domain knowledge to interpret experimental data.
The BixBench paper is available at https://arxiv.org/abs/2503.00096. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The BixBench leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, GPT-5.5 by OpenAI leads with a score of 0.805. The average score across all models is 0.805.
The highest BixBench score is 0.805, achieved by GPT-5.5 from OpenAI.
1 models have been evaluated on the BixBench benchmark, with 0 verified results and 1 self-reported results.
BixBench is categorized under agents, reasoning, and science. The benchmark evaluates text models.