MMMU

MMMU (Massive Multi-discipline Multimodal Understanding) is a benchmark designed to evaluate multimodal models on college-level subject knowledge and deliberate reasoning. Contains 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering across 30 subjects and 183 subfields.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on MMMU

State-of-the-art frontier
Open
Proprietary

MMMU Leaderboard

60 models
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
2
OpenAI
OpenAI
400K$1.25 / $10.00
2400K$1.25 / $10.00
2400K$1.25 / $10.00
5
OpenAI
OpenAI
400K$1.25 / $10.00
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B262K$0.40 / $3.20
7
OpenAI
OpenAI
200K$2.00 / $8.00
8
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B262K$0.30 / $2.40
91.0M$1.25 / $10.00
10
OpenAI
OpenAI
200K$1.10 / $4.40
11
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B262K$0.25 / $2.00
121.0M$0.30 / $2.50
131.0M$1.25 / $10.00
1410B
15128K$3.00 / $15.00
16
OpenAI
OpenAI
200K$15.00 / $60.00
17
18
OpenAI
OpenAI
128K$75.00 / $150.00
19200K$3.00 / $15.00
20
OpenAI
OpenAI
1.0M$2.00 / $8.00
21
22400B1.0M$0.17 / $0.60
231.0M$0.10 / $0.40
241.0M$0.40 / $1.60
25
OpenAI
OpenAI
128K$2.50 / $10.00
261.0M$0.10 / $0.40
27
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
73B
28
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
72B
29
Moonshot AI
Moonshot AI
29
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
34B
31109B10.0M$0.08 / $0.30
32200K$3.00 / $15.00
331.0M$0.07 / $0.30
34128K$2.00 / $10.00
352.1M$2.50 / $10.00
36
Mistral AI
Mistral AI
124B128K$2.00 / $6.00
37
3824B
391.0M$0.15 / $0.60
40
Amazon
Amazon
300K$0.80 / $3.20
4190B128K$0.35 / $0.40
42128K$0.15 / $0.60
4324B
4324B128K$0.10 / $0.30
45
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
7B
46
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
8B
47
Amazon
Amazon
300K$0.06 / $0.24
481.0M$0.10 / $0.40
496B128K$0.05 / $0.10
508B1.0M$0.07 / $0.30
150 of 60
1/2
Notice missing or incorrect data?

FAQ

Common questions about MMMU

MMMU (Massive Multi-discipline Multimodal Understanding) is a benchmark designed to evaluate multimodal models on college-level subject knowledge and deliberate reasoning. Contains 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering across 30 subjects and 183 subfields.
The MMMU paper is available at https://arxiv.org/abs/2311.16502. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MMMU leaderboard ranks 60 AI models based on their performance on this benchmark. Currently, Qwen3.6 Plus by Alibaba Cloud / Qwen Team leads with a score of 0.860. The average score across all models is 0.667.
The highest MMMU score is 0.860, achieved by Qwen3.6 Plus from Alibaba Cloud / Qwen Team.
60 models have been evaluated on the MMMU benchmark, with 0 verified results and 58 self-reported results.
MMMU is categorized under general, healthcare, multimodal, reasoning, and vision. The benchmark evaluates multimodal models.