Blueprint-Bench 2
Progress Over Time
Interactive timeline showing model performance evolution on Blueprint-Bench 2
Blueprint-Bench 2 Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Anthropic | — | 1.0M | $10.00 / $50.00 | ||
| 2 | Google | — | 1.0M | $1.50 / $9.00 |
What is Blueprint-Bench 2?
Blueprint-Bench 2 is an agentic spatial reasoning benchmark that evaluates a model's ability to understand, plan, and reason over architectural blueprints and other structured spatial documents. Scores are reported as a normalized score.
Blueprint-Bench 2 is a multimodal benchmark evaluating models on multimodal, reasoning, and agents tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.4, with the leader at 0.4.
Compare leaders on the best AI for multimodal, best AI for reasoning and best AI for agents leaderboards.
Current leaders
Claude Fable 5 from Anthropic currently leads the Blueprint-Bench 2 leaderboard with a score of 0.386 across 2 evaluated AI models.
FAQ
Common questions about the Blueprint-Bench 2 benchmark and leaderboard.