What is the MMStar leaderboard?

The MMStar leaderboard ranks 22 AI models based on their performance on this benchmark. Currently, Qwen3.6 Plus by Alibaba Cloud / Qwen Team leads with a score of 0.833. The average score across all models is 0.725.

What is the highest MMStar score?

The highest MMStar score is 0.833, achieved by Qwen3.6 Plus from Alibaba Cloud / Qwen Team.

How many models are evaluated on MMStar?

22 models have been evaluated on the MMStar benchmark, with 0 verified results and 22 self-reported results.

Where can I find the MMStar paper?

The MMStar paper is available at https://arxiv.org/abs/2403.20330. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does MMStar cover?

MMStar is categorized under reasoning, general, multimodal, and vision. The benchmark evaluates multimodal models.

What is the best open-source model on MMStar?

Qwen3.5-122B-A10B by Alibaba Cloud / Qwen Team is the top-ranked open-source model on MMStar, with a score of 0.829 (rank #2).

Which model offers the best value on MMStar?

Among models scoring within 10% of the leader, Qwen3 VL 8B Thinking from Alibaba Cloud / Qwen Team is the cheapest, at $0.18 per million input tokens with a score of 0.753.

How recent are the MMStar leaderboard results?

The MMStar leaderboard was last updated in June 2026 and currently includes 22 evaluated models.

All benchmarks

MMStar

MMStar is an elite vision-indispensable multimodal benchmark comprising 1,500 challenge samples meticulously selected by humans to evaluate 6 core capabilities and 18 detailed axes. The benchmark addresses issues of visual content unnecessity and unintentional data leakage in existing multimodal evaluations.

Qwen3.6 Plus from Alibaba Cloud / Qwen Team currently leads the MMStar leaderboard with a score of 0.833 across 22 evaluated AI models.

Paper