ChartMuseum

Progress Over Time

Interactive timeline showing model performance evolution on ChartMuseum

State-of-the-art frontier
Open
Proprietary

ChartMuseum Leaderboard

1 models
ContextCostLicense
1
Anthropic
Anthropic
1.0M$3.00 / $15.00
Notice missing or incorrect data?
About this benchmark

What is ChartMuseum?

ChartMuseum is a chart question-answering benchmark of 1,162 expert-annotated questions over real-world chart images drawn from 184 sources, including academic figures, infographics, and unconventional chart designs. It specifically targets questions that require visual reasoning, such as comparing unlabeled visual elements, tracking trajectories, and judging spatial relationships.

ChartMuseum is a multimodal benchmark evaluating models on multimodal, reasoning, and vision tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–1 scale. The current average is 0.9, with the leader at 0.9.

Compare leaders on the best AI for multimodal, best AI for reasoning and best AI for vision leaderboards.

Current leaders

Claude Sonnet 5 from Anthropic currently leads the ChartMuseum leaderboard with a score of 0.867 across 1 evaluated AI models.

1Claude Sonnet 5Anthropic86.7%

FAQ

Common questions about the ChartMuseum benchmark and leaderboard.

What is the ChartMuseum benchmark?

ChartMuseum is a chart question-answering benchmark of 1,162 expert-annotated questions over real-world chart images drawn from 184 sources, including academic figures, infographics, and unconventional chart designs. It specifically targets questions that require visual reasoning, such as comparing unlabeled visual elements, tracking trajectories, and judging spatial relationships.

What is the ChartMuseum leaderboard?

The ChartMuseum leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Claude Sonnet 5 by Anthropic leads with a score of 0.867. The average score across all models is 0.867.

What is the highest ChartMuseum score?

The highest ChartMuseum score is 0.867, achieved by Claude Sonnet 5 from Anthropic.

How many models are evaluated on ChartMuseum?

1 models have been evaluated on the ChartMuseum benchmark, with 0 verified results and 1 self-reported results.

What categories does ChartMuseum cover?

ChartMuseum is categorized under multimodal, reasoning, and vision. The benchmark evaluates multimodal models.

Which model offers the best value on ChartMuseum?

Among models scoring within 10% of the leader, Claude Sonnet 5 from Anthropic is the cheapest, at $3.00 per million input tokens with a score of 0.867.

How recent are the ChartMuseum leaderboard results?

The ChartMuseum leaderboard was last updated in June 2026 and currently includes 1 evaluated models.
ChartMuseum Leaderboard