VideoMMMU
Video-MMMU evaluates Large Multimodal Models' ability to acquire knowledge from expert-level professional videos across six disciplines through three cognitive stages: perception, comprehension, and adaptation. Contains 300 videos and 900 human-annotated questions spanning Art, Business, Science, Medicine, Humanities, and Engineering.
Progress Over Time
Interactive timeline showing model performance evolution on VideoMMMU
State-of-the-art frontier
Open
Proprietary
VideoMMMU Leaderboard
22 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Google | — | — | — | ||
| 2 | Google | — | 1.0M | $0.50 / $3.00 | ||
| 3 | Moonshot AI | 1.0T | 262K | $0.60 / $2.50 | ||
| 4 | OpenAI | — | 400K | $1.75 / $14.00 | ||
| 5 | Google | — | 1.0M | $0.25 / $1.50 | ||
| 6 | OpenAI | — | 400K | $1.25 / $10.00 | ||
| 7 | Qwen3.6 PlusNew Alibaba Cloud / Qwen Team | — | — | — | ||
| 8 | — | 1.0M | $1.25 / $10.00 | |||
| 9 | OpenAI | — | 200K | $2.00 / $8.00 | ||
| 10 | Alibaba Cloud / Qwen Team | 27B | — | — | ||
| 11 | Alibaba Cloud / Qwen Team | 122B | 262K | $0.40 / $3.20 | ||
| 12 | Alibaba Cloud / Qwen Team | 35B | 262K | $0.25 / $2.00 | ||
| 13 | Alibaba Cloud / Qwen Team | 236B | 262K | $0.45 / $3.49 | ||
| 14 | Alibaba Cloud / Qwen Team | 33B | — | — | ||
| 15 | Alibaba Cloud / Qwen Team | 31B | 262K | $0.20 / $1.00 | ||
| 16 | Alibaba Cloud / Qwen Team | 236B | 262K | $0.30 / $1.49 | ||
| 17 | Alibaba Cloud / Qwen Team | 9B | 262K | $0.18 / $2.09 | ||
| 18 | Alibaba Cloud / Qwen Team | 4B | 262K | $0.10 / $1.00 | ||
| 19 | Alibaba Cloud / Qwen Team | 31B | 262K | $0.20 / $0.70 | ||
| 20 | Alibaba Cloud / Qwen Team | 9B | 262K | $0.08 / $0.50 | ||
| 21 | OpenAI | — | 128K | $2.50 / $10.00 | ||
| 22 | Alibaba Cloud / Qwen Team | 4B | 262K | $0.10 / $0.60 |
Notice missing or incorrect data?
FAQ
Common questions about VideoMMMU
Video-MMMU evaluates Large Multimodal Models' ability to acquire knowledge from expert-level professional videos across six disciplines through three cognitive stages: perception, comprehension, and adaptation. Contains 300 videos and 900 human-annotated questions spanning Art, Business, Science, Medicine, Humanities, and Engineering.
The VideoMMMU paper is available at https://arxiv.org/abs/2501.13826. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The VideoMMMU leaderboard ranks 22 AI models based on their performance on this benchmark. Currently, Gemini 3 Pro by Google leads with a score of 0.876. The average score across all models is 0.779.
The highest VideoMMMU score is 0.876, achieved by Gemini 3 Pro from Google.
22 models have been evaluated on the VideoMMMU benchmark, with 0 verified results and 22 self-reported results.
VideoMMMU is categorized under vision, healthcare, multimodal, and reasoning. The benchmark evaluates multimodal models.