Blueprint-Bench 2

Progress Over Time

Interactive timeline showing model performance evolution on Blueprint-Bench 2

State-of-the-art frontier
Open
Proprietary

Blueprint-Bench 2 Leaderboard

2 models
ContextCostLicense
11.0M$10.00 / $50.00
21.0M$1.50 / $9.00
Notice missing or incorrect data?
About this benchmark

What is Blueprint-Bench 2?

Blueprint-Bench 2 is an agentic spatial reasoning benchmark that evaluates a model's ability to understand, plan, and reason over architectural blueprints and other structured spatial documents. Scores are reported as a normalized score.

Blueprint-Bench 2 is a multimodal benchmark evaluating models on multimodal, reasoning, and agents tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.4, with the leader at 0.4.

Compare leaders on the best AI for multimodal, best AI for reasoning and best AI for agents leaderboards.

Current leaders

Claude Fable 5 from Anthropic currently leads the Blueprint-Bench 2 leaderboard with a score of 0.386 across 2 evaluated AI models.

1Claude Fable 5Anthropic38.6%
2Gemini 3.5 FlashGoogle33.6%

FAQ

Common questions about the Blueprint-Bench 2 benchmark and leaderboard.

What is the Blueprint-Bench 2 benchmark?

Blueprint-Bench 2 is an agentic spatial reasoning benchmark that evaluates a model's ability to understand, plan, and reason over architectural blueprints and other structured spatial documents. Scores are reported as a normalized score.

What is the Blueprint-Bench 2 leaderboard?

The Blueprint-Bench 2 leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Claude Fable 5 by Anthropic leads with a score of 0.386. The average score across all models is 0.361.

What is the highest Blueprint-Bench 2 score?

The highest Blueprint-Bench 2 score is 0.386, achieved by Claude Fable 5 from Anthropic.

How many models are evaluated on Blueprint-Bench 2?

2 models have been evaluated on the Blueprint-Bench 2 benchmark, with 0 verified results and 2 self-reported results.

What categories does Blueprint-Bench 2 cover?

Blueprint-Bench 2 is categorized under multimodal, reasoning, and agents. The benchmark evaluates multimodal models.

Which model offers the best value on Blueprint-Bench 2?

Among models scoring within 10% of the leader, Claude Fable 5 from Anthropic is the cheapest, at $10.00 per million input tokens with a score of 0.386.

How recent are the Blueprint-Bench 2 leaderboard results?

The Blueprint-Bench 2 leaderboard was last updated in July 2026 and currently includes 2 evaluated models.