Video-MME (long, no subtitles)
Video-MME is the first-ever comprehensive evaluation benchmark for Multi-modal Large Language Models (MLLMs) in video analysis. This variant focuses on long-term videos (30min-60min) without subtitle inputs, testing robust contextual dynamics across 6 primary visual domains with 30 subfields including knowledge, film & television, sports competition, life record, and multilingual content.
Progress Over Time
Interactive timeline showing model performance evolution on Video-MME (long, no subtitles)
State-of-the-art frontier
Open
Proprietary
Video-MME (long, no subtitles) Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | OpenAI | — | 1.0M | $2.00 / $8.00 |
Notice missing or incorrect data?
FAQ
Common questions about Video-MME (long, no subtitles)
Video-MME is the first-ever comprehensive evaluation benchmark for Multi-modal Large Language Models (MLLMs) in video analysis. This variant focuses on long-term videos (30min-60min) without subtitle inputs, testing robust contextual dynamics across 6 primary visual domains with 30 subfields including knowledge, film & television, sports competition, life record, and multilingual content.
The Video-MME (long, no subtitles) paper is available at https://arxiv.org/abs/2405.21075. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The Video-MME (long, no subtitles) leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, GPT-4.1 by OpenAI leads with a score of 0.720. The average score across all models is 0.720.
The highest Video-MME (long, no subtitles) score is 0.720, achieved by GPT-4.1 from OpenAI.
1 models have been evaluated on the Video-MME (long, no subtitles) benchmark, with 0 verified results and 1 self-reported results.
Video-MME (long, no subtitles) is categorized under multimodal, video, and vision. The benchmark evaluates multimodal models with multilingual support.