Benchmarks/math/MathVista-Mini

MathVista-Mini

MathVista-Mini is a smaller version of the MathVista benchmark that evaluates mathematical reasoning in visual contexts. It consists of examples derived from multimodal datasets involving mathematics, combining challenges from diverse mathematical and visual tasks to assess foundation models' ability to solve problems requiring both visual understanding and mathematical reasoning.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on MathVista-Mini

State-of-the-art frontier
Open
Proprietary

MathVista-Mini Leaderboard

23 models
ContextCostLicense
1
Moonshot AI
Moonshot AI
1.0T262K$0.60 / $3.00
2
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B262K$0.30 / $2.40
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
28B262K$0.60 / $3.60
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B262K$0.40 / $3.20
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B262K$0.25 / $2.00
7
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
8
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B262K$0.45 / $3.49
9
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B262K$0.30 / $1.49
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
11
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B262K$0.20 / $1.00
12
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.18 / $2.09
13
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B262K$0.20 / $0.70
14
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $1.00
15
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.08 / $0.50
16
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
72B
17
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
34B
18
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $0.60
19
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
73B
20
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
8B
2127B131K$0.10 / $0.20
2212B131K$0.05 / $0.10
234B131K$0.02 / $0.04
Notice missing or incorrect data?

FAQ

Common questions about MathVista-Mini

MathVista-Mini is a smaller version of the MathVista benchmark that evaluates mathematical reasoning in visual contexts. It consists of examples derived from multimodal datasets involving mathematics, combining challenges from diverse mathematical and visual tasks to assess foundation models' ability to solve problems requiring both visual understanding and mathematical reasoning.
The MathVista-Mini paper is available at https://arxiv.org/abs/2310.02255. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MathVista-Mini leaderboard ranks 23 AI models based on their performance on this benchmark. Currently, Kimi K2.5 by Moonshot AI leads with a score of 0.901. The average score across all models is 0.786.
The highest MathVista-Mini score is 0.901, achieved by Kimi K2.5 from Moonshot AI.
23 models have been evaluated on the MathVista-Mini benchmark, with 0 verified results and 23 self-reported results.
MathVista-Mini is categorized under math, multimodal, and vision. The benchmark evaluates multimodal models.