MMMU

MMMU (Massive Multi-discipline Multimodal Understanding) is a benchmark designed to evaluate multimodal models on college-level subject knowledge and deliberate reasoning. Contains 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering across 30 subjects and 183 subfields.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on MMMU

State-of-the-art frontier
Open
Proprietary

MMMU Leaderboard

59 models • 0 verified
ContextCostLicense
1
0.854400K
$1.25
$10.00
1
0.854400K
$1.25
$10.00
1
OpenAI
OpenAI
0.854400K
$1.25
$10.00
4
OpenAI
OpenAI
0.842400K
$1.25
$10.00
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.839122B262K
$0.40
$3.20
6
OpenAI
OpenAI
0.829200K
$2.00
$8.00
7
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.82327B
8
0.8201.0M
$1.25
$10.00
9
OpenAI
OpenAI
0.816200K
$1.10
$4.40
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.81435B262K
$0.25
$2.00
11
0.7971.0M
$0.30
$2.50
12
0.7961.0M
$1.25
$10.00
13
0.78110B
14
0.780128K
$3.00
$15.00
15
OpenAI
OpenAI
0.776200K
$15.00
$60.00
16
0.754
17
OpenAI
OpenAI
0.752128K
$75.00
$150.00
18
0.750200K
$3.00
$15.00
19
OpenAI
OpenAI
0.7481.0M
$2.00
$8.00
20
0.744200K
$3.00
$15.00
21
0.734400B1.0M
$0.17
$0.85
22
0.7291.0M
$0.10
$0.40
23
0.7271.0M
$0.40
$1.60
24
OpenAI
OpenAI
0.722128K
$2.50
$10.00
25
0.7071.0M
$0.10
$0.40
26
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.70373B
27
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.70272B
28
Moonshot AI
Moonshot AI
0.700
28
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.70034B
30
0.694109B10.0M
$0.08
$0.30
31
0.683200K
$3.00
$15.00
32
0.6801.0M
$0.07
$0.30
33
0.661128K
$2.00
$10.00
34
0.6592.1M
$2.50
$10.00
35
Mistral AI
Mistral AI
0.640124B128K
$2.00
$6.00
36
0.632
37
0.62524B
38
0.6231.0M
$0.15
$0.60
39
Amazon
Amazon
0.617300K
$0.80
$3.20
40
0.60390B128K
$0.35
$0.40
41
0.594128K
$0.15
$0.60
42
0.59324B
42
0.59324B128K
$0.10
$0.30
44
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.5927B
45
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.5868B
46
Amazon
Amazon
0.562300K
$0.06
$0.24
47
0.5541.0M
$0.10
$0.40
48
0.5516B128K
$0.05
$0.10
49
0.5378B1.0M
$0.07
$0.30
50
0.536
Showing 1-50 of 59
1 / 2
Notice missing or incorrect data?Start an Issue discussion

FAQ

Common questions about MMMU

MMMU (Massive Multi-discipline Multimodal Understanding) is a benchmark designed to evaluate multimodal models on college-level subject knowledge and deliberate reasoning. Contains 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering across 30 subjects and 183 subfields.
The MMMU paper is available at https://arxiv.org/abs/2311.16502. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MMMU leaderboard ranks 59 AI models based on their performance on this benchmark. Currently, GPT-5.1 Instant by OpenAI leads with a score of 0.854. The average score across all models is 0.663.
The highest MMMU score is 0.854, achieved by GPT-5.1 Instant from OpenAI.
59 models have been evaluated on the MMMU benchmark, with 0 verified results and 57 self-reported results.
MMMU is categorized under general, healthcare, multimodal, and reasoning. The benchmark evaluates multimodal models.