MathVision
MATH-Vision is a dataset designed to measure multimodal mathematical reasoning capabilities. It focuses on evaluating how well models can solve mathematical problems that require both visual understanding and mathematical reasoning, bridging the gap between visual and mathematical domains.
Progress Over Time
Interactive timeline showing model performance evolution on MathVision
State-of-the-art frontier
Open
Proprietary
MathVision Leaderboard
25 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Qwen3.6 PlusNew Alibaba Cloud / Qwen Team | — | — | — | ||
| 2 | Alibaba Cloud / Qwen Team | 122B | 262K | $0.40 / $3.20 | ||
| 3 | Alibaba Cloud / Qwen Team | 27B | — | — | ||
| 4 | Gemma 4 31BNew Google | 31B | — | — | ||
| 5 | Moonshot AI | 1.0T | 262K | $0.60 / $2.50 | ||
| 6 | Alibaba Cloud / Qwen Team | 35B | 262K | $0.25 / $2.00 | ||
| 7 | Google | 25B | — | — | ||
| 8 | Alibaba Cloud / Qwen Team | 236B | 262K | $0.45 / $3.49 | ||
| 9 | StepFun | 10B | — | — | ||
| 10 | Alibaba Cloud / Qwen Team | 33B | — | — | ||
| 11 | Alibaba Cloud / Qwen Team | 236B | 262K | $0.30 / $1.49 | ||
| 12 | Alibaba Cloud / Qwen Team | 31B | 262K | $0.20 / $1.00 | ||
| 13 | Alibaba Cloud / Qwen Team | 33B | — | — | ||
| 14 | Alibaba Cloud / Qwen Team | 9B | 262K | $0.18 / $2.09 | ||
| 15 | Alibaba Cloud / Qwen Team | 31B | 262K | $0.20 / $0.70 | ||
| 16 | Alibaba Cloud / Qwen Team | 4B | 262K | $0.10 / $1.00 | ||
| 17 | Gemma 4 E4BNew Google | 8B | — | — | ||
| 18 | Alibaba Cloud / Qwen Team | 9B | 262K | $0.08 / $0.50 | ||
| 19 | Gemma 4 E2BNew Google | 5B | — | — | ||
| 20 | Alibaba Cloud / Qwen Team | 4B | 262K | $0.10 / $0.60 | ||
| 21 | Alibaba Cloud / Qwen Team | 34B | — | — | ||
| 22 | Alibaba Cloud / Qwen Team | 72B | — | — | ||
| 23 | Alibaba Cloud / Qwen Team | 73B | — | — | ||
| 24 | Alibaba Cloud / Qwen Team | 8B | — | — | ||
| 25 | Alibaba Cloud / Qwen Team | 7B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about MathVision
MATH-Vision is a dataset designed to measure multimodal mathematical reasoning capabilities. It focuses on evaluating how well models can solve mathematical problems that require both visual understanding and mathematical reasoning, bridging the gap between visual and mathematical domains.
The MathVision paper is available at https://arxiv.org/abs/2402.14804. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MathVision leaderboard ranks 25 AI models based on their performance on this benchmark. Currently, Qwen3.6 Plus by Alibaba Cloud / Qwen Team leads with a score of 0.880. The average score across all models is 0.628.
The highest MathVision score is 0.880, achieved by Qwen3.6 Plus from Alibaba Cloud / Qwen Team.
25 models have been evaluated on the MathVision benchmark, with 0 verified results and 25 self-reported results.
MathVision is categorized under math, multimodal, and vision. The benchmark evaluates multimodal models.