BioMysteryBench

Name: BioMysteryBench Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on BioMysteryBench

State-of-the-art frontier

Open

Proprietary

BioMysteryBench Leaderboard

2 models

				Context	Cost	License
1	Claude Opus 5New Anthropic		—	1.0M	$5.00 / $25.00
2	Claude Fable 5 Anthropic		—	1.0M	$10.00 / $50.00

Notice missing or incorrect data?

About this benchmark

What is BioMysteryBench?

BioMysteryBench evaluates a model's ability to reason through challenging molecular biology problems, reporting performance on a hard subset and on the subset of problems solved by human experts.

BioMysteryBench is a text benchmark evaluating models on reasoning, science, and biology tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.9.

Compare leaders on the best AI for reasoning, best AI for science and best AI for biology leaderboards.

Current leaders

Claude Opus 5 from Anthropic currently leads the BioMysteryBench leaderboard with a score of 0.901 across 2 evaluated AI models.

Claude Opus 5Anthropic90.1%

Claude Fable 5Anthropic46.1%

FAQ

Common questions about the BioMysteryBench benchmark and leaderboard.

What is the BioMysteryBench benchmark?

BioMysteryBench evaluates a model's ability to reason through challenging molecular biology problems, reporting performance on a hard subset and on the subset of problems solved by human experts.

What is the BioMysteryBench leaderboard?

The BioMysteryBench leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Claude Opus 5 by Anthropic leads with a score of 0.901. The average score across all models is 0.681.

What is the highest BioMysteryBench score?

The highest BioMysteryBench score is 0.901, achieved by Claude Opus 5 from Anthropic.

How many models are evaluated on BioMysteryBench?

2 models have been evaluated on the BioMysteryBench benchmark, with 0 verified results and 2 self-reported results.

What categories does BioMysteryBench cover?

BioMysteryBench is categorized under reasoning, science, and biology. The benchmark evaluates text models.

Which model offers the best value on BioMysteryBench?

Among models scoring within 10% of the leader, Claude Opus 5 from Anthropic is the cheapest, at $5.00 per million input tokens with a score of 0.901.

How recent are the BioMysteryBench leaderboard results?

The BioMysteryBench leaderboard was last updated in July 2026 and currently includes 2 evaluated models.