Benchmarks/vision/VideoMMMU

VideoMMMU

Video-MMMU evaluates Large Multimodal Models' ability to acquire knowledge from expert-level professional videos across six disciplines through three cognitive stages: perception, comprehension, and adaptation. Contains 300 videos and 900 human-annotated questions spanning Art, Business, Science, Medicine, Humanities, and Engineering.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on VideoMMMU

State-of-the-art frontier
Open
Proprietary

VideoMMMU Leaderboard

22 models
ContextCostLicense
1
21.0M$0.50 / $3.00
3
Moonshot AI
Moonshot AI
1.0T262K$0.60 / $2.50
4
OpenAI
OpenAI
400K$1.75 / $14.00
51.0M$0.25 / $1.50
6
OpenAI
OpenAI
400K$1.25 / $10.00
7
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
81.0M$1.25 / $10.00
9
OpenAI
OpenAI
200K$2.00 / $8.00
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B
11
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B262K$0.40 / $3.20
12
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B262K$0.25 / $2.00
13
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B262K$0.45 / $3.49
14
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
15
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B262K$0.20 / $1.00
16
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B262K$0.30 / $1.49
17
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.18 / $2.09
18
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $1.00
19
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B262K$0.20 / $0.70
20
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.08 / $0.50
21
OpenAI
OpenAI
128K$2.50 / $10.00
22
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $0.60
Notice missing or incorrect data?

FAQ

Common questions about VideoMMMU

Video-MMMU evaluates Large Multimodal Models' ability to acquire knowledge from expert-level professional videos across six disciplines through three cognitive stages: perception, comprehension, and adaptation. Contains 300 videos and 900 human-annotated questions spanning Art, Business, Science, Medicine, Humanities, and Engineering.
The VideoMMMU paper is available at https://arxiv.org/abs/2501.13826. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The VideoMMMU leaderboard ranks 22 AI models based on their performance on this benchmark. Currently, Gemini 3 Pro by Google leads with a score of 0.876. The average score across all models is 0.779.
The highest VideoMMMU score is 0.876, achieved by Gemini 3 Pro from Google.
22 models have been evaluated on the VideoMMMU benchmark, with 0 verified results and 22 self-reported results.
VideoMMMU is categorized under vision, healthcare, multimodal, and reasoning. The benchmark evaluates multimodal models.