MathVision

Paper

Progress Over Time

Interactive timeline showing model performance evolution on MathVision

State-of-the-art frontier
Open
Proprietary

MathVision Leaderboard

31 models
ContextCostLicense
1
ByteDance
ByteDance
2
Moonshot AI
Moonshot AI
1.0T262K$0.75 / $3.50
3
4
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$0.32 / $1.28
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$0.50 / $3.00
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B
7
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B262K$0.30 / $2.40
831B262K$0.13 / $0.38
9
Moonshot AI
Moonshot AI
1.0T
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B
1125B262K$0.13 / $0.40
1212B
13
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B
1410B
1525B
16
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
17
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B
18
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B
19
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
20
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.18 / $2.09
21
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B
22
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $1.00
238B
24
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B
255B
26
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $0.60
27
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
34B
28
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
72B
29
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
73B
30
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
8B
31
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
7B
Notice missing or incorrect data?
About this benchmark

What is MathVision?

MATH-Vision is a dataset designed to measure multimodal mathematical reasoning capabilities. It focuses on evaluating how well models can solve mathematical problems that require both visual understanding and mathematical reasoning, bridging the gap between visual and mathematical domains.

MathVision is a multimodal benchmark evaluating models on math, multimodal, and vision tasks. LLM Stats tracks 31 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.9.

Compare leaders on the best AI for math, best AI for multimodal and best AI for vision leaderboards.

Current leaders

Seed 2.1 Pro from ByteDance currently leads the MathVision leaderboard with a score of 0.945 across 31 evaluated AI models.

1Seed 2.1 ProByteDance94.5%
2Kimi K2.6Moonshot AI93.2%
3Seed 2.1 TurboByteDance92.7%

Source paper

Title
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
Authors
Ke Wang, Junting Pan, Weikang Shi, Zimu Lu, and 2 others
Published
Abstract

Recent advancements in Large Multimodal Models (LMMs) have shown promising results in mathematical reasoning within visual contexts, with models approaching human-level performance on existing benchmarks such as MathVista. However, we observe significant limitations in the diversity of questions and breadth of subjects covered by these benchmarks. To address this issue, we present the MATH-Vision (MATH-V) dataset, a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. Spanning 16 distinct mathematical disciplines and graded across 5 levels of difficulty, our dataset provides a comprehensive and diverse set of challenges for evaluating the mathematical reasoning abilities of LMMs. Through extensive experimentation, we unveil a notable performance gap between current LMMs and human performance on MATH-V, underscoring the imperative for further advancements in LMMs. Moreover, our detailed categorization allows for a thorough error analysis of LMMs, offering valuable insights to guide future research and development. The project is available at https://mathvision-cuhk.github.io

FAQ

Common questions about the MathVision benchmark and leaderboard.

What is the MathVision benchmark?

MATH-Vision is a dataset designed to measure multimodal mathematical reasoning capabilities. It focuses on evaluating how well models can solve mathematical problems that require both visual understanding and mathematical reasoning, bridging the gap between visual and mathematical domains.

What is the MathVision leaderboard?

The MathVision leaderboard ranks 31 AI models based on their performance on this benchmark. Currently, Seed 2.1 Pro by ByteDance leads with a score of 0.945. The average score across all models is 0.675.

What is the highest MathVision score?

The highest MathVision score is 0.945, achieved by Seed 2.1 Pro from ByteDance.

How many models are evaluated on MathVision?

31 models have been evaluated on the MathVision benchmark, with 0 verified results and 31 self-reported results.

Where can I find the MathVision paper?

The MathVision paper is available at https://arxiv.org/abs/2402.14804. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does MathVision cover?

MathVision is categorized under math, multimodal, and vision. The benchmark evaluates multimodal models.

What is the best open-source model on MathVision?

Kimi K2.6 by Moonshot AI is the top-ranked open-source model on MathVision, with a score of 0.932 (rank #2).

Which model offers the best value on MathVision?

Among models scoring within 10% of the leader, Gemma 4 31B from Google is the cheapest, at $0.13 per million input tokens with a score of 0.856.

How recent are the MathVision leaderboard results?

The MathVision leaderboard was last updated in July 2026 and currently includes 31 evaluated models.