MathVerse

Progress Over Time

Interactive timeline showing model performance evolution on MathVerse

State-of-the-art frontier
Open
Proprietary

MathVerse Leaderboard

2 models
ContextCostLicense
1
ByteDance
ByteDance
2
ByteDance
ByteDance
Notice missing or incorrect data?
About this benchmark

What is MathVerse?

MathVerse evaluates multimodal mathematical reasoning, testing whether models genuinely interpret visual math diagrams rather than relying on text.

MathVerse is a multimodal benchmark evaluating models on math, multimodal, reasoning, and vision tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.9, with the leader at 0.9.

Compare leaders on the best AI for math, best AI for multimodal, best AI for reasoning and best AI for vision leaderboards.

Current leaders

Seed 2.1 Pro from ByteDance currently leads the MathVerse leaderboard with a score of 0.897 across 2 evaluated AI models.

1Seed 2.1 ProByteDance89.7%
2Seed 2.1 TurboByteDance89.2%

FAQ

Common questions about the MathVerse benchmark and leaderboard.

What is the MathVerse benchmark?

MathVerse evaluates multimodal mathematical reasoning, testing whether models genuinely interpret visual math diagrams rather than relying on text.

What is the MathVerse leaderboard?

The MathVerse leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Seed 2.1 Pro by ByteDance leads with a score of 0.897. The average score across all models is 0.895.

What is the highest MathVerse score?

The highest MathVerse score is 0.897, achieved by Seed 2.1 Pro from ByteDance.

How many models are evaluated on MathVerse?

2 models have been evaluated on the MathVerse benchmark, with 0 verified results and 2 self-reported results.

What categories does MathVerse cover?

MathVerse is categorized under math, multimodal, reasoning, and vision. The benchmark evaluates multimodal models.

How recent are the MathVerse leaderboard results?

The MathVerse leaderboard was last updated in June 2026 and currently includes 2 evaluated models.