MathVista-Mini
MathVista-Mini is a smaller version of the MathVista benchmark that evaluates mathematical reasoning in visual contexts. It consists of examples derived from multimodal datasets involving mathematics, combining challenges from diverse mathematical and visual tasks to assess foundation models' ability to solve problems requiring both visual understanding and mathematical reasoning.
Progress Over Time
Interactive timeline showing model performance evolution on MathVista-Mini
State-of-the-art frontier
Open
Proprietary
MathVista-Mini Leaderboard
23 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Moonshot AI | 1.0T | 262K | $0.60 / $3.00 | ||
| 2 | Alibaba Cloud / Qwen Team | 27B | 262K | $0.30 / $2.40 | ||
| 3 | Qwen3.6-27BNew Alibaba Cloud / Qwen Team | 28B | 262K | $0.60 / $3.60 | ||
| 3 | Alibaba Cloud / Qwen Team | 122B | 262K | $0.40 / $3.20 | ||
| 5 | Alibaba Cloud / Qwen Team | 35B | — | — | ||
| 6 | Alibaba Cloud / Qwen Team | 35B | 262K | $0.25 / $2.00 | ||
| 7 | Alibaba Cloud / Qwen Team | 33B | — | — | ||
| 8 | Alibaba Cloud / Qwen Team | 236B | 262K | $0.45 / $3.49 | ||
| 9 | Alibaba Cloud / Qwen Team | 236B | 262K | $0.30 / $1.49 | ||
| 10 | Alibaba Cloud / Qwen Team | 33B | — | — | ||
| 11 | Alibaba Cloud / Qwen Team | 31B | 262K | $0.20 / $1.00 | ||
| 12 | Alibaba Cloud / Qwen Team | 9B | 262K | $0.18 / $2.09 | ||
| 13 | Alibaba Cloud / Qwen Team | 31B | 262K | $0.20 / $0.70 | ||
| 14 | Alibaba Cloud / Qwen Team | 4B | 262K | $0.10 / $1.00 | ||
| 15 | Alibaba Cloud / Qwen Team | 9B | 262K | $0.08 / $0.50 | ||
| 16 | Alibaba Cloud / Qwen Team | 72B | — | — | ||
| 17 | Alibaba Cloud / Qwen Team | 34B | — | — | ||
| 18 | Alibaba Cloud / Qwen Team | 4B | 262K | $0.10 / $0.60 | ||
| 19 | Alibaba Cloud / Qwen Team | 73B | — | — | ||
| 20 | Alibaba Cloud / Qwen Team | 8B | — | — | ||
| 21 | Google | 27B | 131K | $0.10 / $0.20 | ||
| 22 | Google | 12B | 131K | $0.05 / $0.10 | ||
| 23 | Google | 4B | 131K | $0.02 / $0.04 |
Notice missing or incorrect data?
FAQ
Common questions about MathVista-Mini
MathVista-Mini is a smaller version of the MathVista benchmark that evaluates mathematical reasoning in visual contexts. It consists of examples derived from multimodal datasets involving mathematics, combining challenges from diverse mathematical and visual tasks to assess foundation models' ability to solve problems requiring both visual understanding and mathematical reasoning.
The MathVista-Mini paper is available at https://arxiv.org/abs/2310.02255. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MathVista-Mini leaderboard ranks 23 AI models based on their performance on this benchmark. Currently, Kimi K2.5 by Moonshot AI leads with a score of 0.901. The average score across all models is 0.786.
The highest MathVista-Mini score is 0.901, achieved by Kimi K2.5 from Moonshot AI.
23 models have been evaluated on the MathVista-Mini benchmark, with 0 verified results and 23 self-reported results.
MathVista-Mini is categorized under math, multimodal, and vision. The benchmark evaluates multimodal models.