Benchmarks/math/MathVision

MathVision

MATH-Vision is a dataset designed to measure multimodal mathematical reasoning capabilities. It focuses on evaluating how well models can solve mathematical problems that require both visual understanding and mathematical reasoning, bridging the gap between visual and mathematical domains.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on MathVision

State-of-the-art frontier
Open
Proprietary

MathVision Leaderboard

25 models
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
2
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B262K$0.40 / $3.20
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B
4
Google
Google
31B
5
Moonshot AI
Moonshot AI
1.0T262K$0.60 / $2.50
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B262K$0.25 / $2.00
725B
8
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B262K$0.45 / $3.49
910B
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
11
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B262K$0.30 / $1.49
12
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B262K$0.20 / $1.00
13
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
14
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.18 / $2.09
15
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B262K$0.20 / $0.70
16
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $1.00
17
Google
Google
8B
18
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.08 / $0.50
19
Google
Google
5B
20
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $0.60
21
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
34B
22
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
72B
23
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
73B
24
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
8B
25
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
7B
Notice missing or incorrect data?

FAQ

Common questions about MathVision

MATH-Vision is a dataset designed to measure multimodal mathematical reasoning capabilities. It focuses on evaluating how well models can solve mathematical problems that require both visual understanding and mathematical reasoning, bridging the gap between visual and mathematical domains.
The MathVision paper is available at https://arxiv.org/abs/2402.14804. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MathVision leaderboard ranks 25 AI models based on their performance on this benchmark. Currently, Qwen3.6 Plus by Alibaba Cloud / Qwen Team leads with a score of 0.880. The average score across all models is 0.628.
The highest MathVision score is 0.880, achieved by Qwen3.6 Plus from Alibaba Cloud / Qwen Team.
25 models have been evaluated on the MathVision benchmark, with 0 verified results and 25 self-reported results.
MathVision is categorized under math, multimodal, and vision. The benchmark evaluates multimodal models.