MathVision

Name: MathVision Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Paper

Progress Over Time

Interactive timeline showing model performance evolution on MathVision

State-of-the-art frontier

Open

Proprietary

MathVision Leaderboard

31 models

			Context	Cost
1	Seed 2.1 Pro ByteDance	—	—	—
2	Kimi K2.6 Moonshot AI	1.0T	262K	$0.75 / $3.50
3	Seed 2.1 Turbo ByteDance	—	—	—
4	Qwen3.7-Plus Alibaba Cloud / Qwen Team	—	1.0M	$0.32 / $1.28
5	Qwen3.6 Plus Alibaba Cloud / Qwen Team	—	1.0M	$0.50 / $3.00
6	Qwen3.5-122B-A10B Alibaba Cloud / Qwen Team	122B	—	—
7	Qwen3.5-27B Alibaba Cloud / Qwen Team	27B	262K	$0.30 / $2.40
8	Gemma 4 31B Google	31B	262K	$0.13 / $0.38
9	Kimi K2.5 Moonshot AI	1.0T	—	—
10	Qwen3.5-35B-A3B Alibaba Cloud / Qwen Team	35B	—	—
11	Gemma 4 26B-A4B Google	25B	262K	$0.13 / $0.40
12	Gemma 4 12B Google	12B	—	—
13	Qwen3 VL 235B A22B Thinking Alibaba Cloud / Qwen Team	236B	—	—
14	Step3-VL-10B StepFun	10B	—	—
15	DiffusionGemma 26B-A4B Google	25B	—	—
16	Qwen3 VL 32B Thinking Alibaba Cloud / Qwen Team	33B	—	—
17	Qwen3 VL 235B A22B Instruct Alibaba Cloud / Qwen Team	236B	—	—
18	Qwen3 VL 30B A3B Thinking Alibaba Cloud / Qwen Team	31B	—	—
19	Qwen3 VL 32B Instruct Alibaba Cloud / Qwen Team	33B	—	—
20	Qwen3 VL 8B Thinking Alibaba Cloud / Qwen Team	9B	262K	$0.18 / $2.09
21	Qwen3 VL 30B A3B Instruct Alibaba Cloud / Qwen Team	31B	—	—
22	Qwen3 VL 4B Thinking Alibaba Cloud / Qwen Team	4B	262K	$0.10 / $1.00
23	Gemma 4 E4B Google	8B	—	—
24	Qwen3 VL 8B Instruct Alibaba Cloud / Qwen Team	9B	—	—
25	Gemma 4 E2B Google	5B	—	—
26	Qwen3 VL 4B Instruct Alibaba Cloud / Qwen Team	4B	262K	$0.10 / $0.60
27	Qwen2.5 VL 32B Instruct Alibaba Cloud / Qwen Team	34B	—	—
28	Qwen2.5 VL 72B Instruct Alibaba Cloud / Qwen Team	72B	—	—
29	QvQ-72B-Preview Alibaba Cloud / Qwen Team	73B	—	—
30	Qwen2.5 VL 7B Instruct Alibaba Cloud / Qwen Team	8B	—	—
31	Qwen2.5-Omni-7B Alibaba Cloud / Qwen Team	7B	—	—

Notice missing or incorrect data?

About this benchmark

What is MathVision?

MATH-Vision is a dataset designed to measure multimodal mathematical reasoning capabilities. It focuses on evaluating how well models can solve mathematical problems that require both visual understanding and mathematical reasoning, bridging the gap between visual and mathematical domains.

MathVision is a multimodal benchmark evaluating models on math, multimodal, and vision tasks. LLM Stats tracks 31 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.9.

Compare leaders on the best AI for math, best AI for multimodal and best AI for vision leaderboards.

Current leaders

Seed 2.1 Pro from ByteDance currently leads the MathVision leaderboard with a score of 0.945 across 31 evaluated AI models.

Seed 2.1 ProByteDance94.5%

Kimi K2.6Moonshot AI93.2%

Seed 2.1 TurboByteDance92.7%

Source paper

Title: Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
Authors: Ke Wang, Junting Pan, Weikang Shi, Zimu Lu, and 2 others
Published: February 22, 2024
arXiv: 2402.14804

Abstract

Recent advancements in Large Multimodal Models (LMMs) have shown promising results in mathematical reasoning within visual contexts, with models approaching human-level performance on existing benchmarks such as MathVista. However, we observe significant limitations in the diversity of questions and breadth of subjects covered by these benchmarks. To address this issue, we present the MATH-Vision (MATH-V) dataset, a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. Spanning 16 distinct mathematical disciplines and graded across 5 levels of difficulty, our dataset provides a comprehensive and diverse set of challenges for evaluating the mathematical reasoning abilities of LMMs. Through extensive experimentation, we unveil a notable performance gap between current LMMs and human performance on MATH-V, underscoring the imperative for further advancements in LMMs. Moreover, our detailed categorization allows for a thorough error analysis of LMMs, offering valuable insights to guide future research and development. The project is available at https://mathvision-cuhk.github.io

FAQ

Common questions about the MathVision benchmark and leaderboard.

What is the MathVision benchmark?

What is the MathVision leaderboard?

The MathVision leaderboard ranks 31 AI models based on their performance on this benchmark. Currently, Seed 2.1 Pro by ByteDance leads with a score of 0.945. The average score across all models is 0.675.

What is the highest MathVision score?

The highest MathVision score is 0.945, achieved by Seed 2.1 Pro from ByteDance.

How many models are evaluated on MathVision?

31 models have been evaluated on the MathVision benchmark, with 0 verified results and 31 self-reported results.

Where can I find the MathVision paper?

The MathVision paper is available at https://arxiv.org/abs/2402.14804. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does MathVision cover?

MathVision is categorized under math, multimodal, and vision. The benchmark evaluates multimodal models.

What is the best open-source model on MathVision?

Kimi K2.6 by Moonshot AI is the top-ranked open-source model on MathVision, with a score of 0.932 (rank #2).

Which model offers the best value on MathVision?

Among models scoring within 10% of the leader, Gemma 4 31B from Google is the cheapest, at $0.13 per million input tokens with a score of 0.856.

How recent are the MathVision leaderboard results?

The MathVision leaderboard was last updated in July 2026 and currently includes 31 evaluated models.