Benchmarks/multimodal/Video-MME (long, no subtitles)

Video-MME (long, no subtitles)

Video-MME is the first-ever comprehensive evaluation benchmark for Multi-modal Large Language Models (MLLMs) in video analysis. This variant focuses on long-term videos (30min-60min) without subtitle inputs, testing robust contextual dynamics across 6 primary visual domains with 30 subfields including knowledge, film & television, sports competition, life record, and multilingual content.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on Video-MME (long, no subtitles)

State-of-the-art frontier
Open
Proprietary

Video-MME (long, no subtitles) Leaderboard

1 models
ContextCostLicense
1
OpenAI
OpenAI
1.0M$2.00 / $8.00
Notice missing or incorrect data?

FAQ

Common questions about Video-MME (long, no subtitles)

Video-MME is the first-ever comprehensive evaluation benchmark for Multi-modal Large Language Models (MLLMs) in video analysis. This variant focuses on long-term videos (30min-60min) without subtitle inputs, testing robust contextual dynamics across 6 primary visual domains with 30 subfields including knowledge, film & television, sports competition, life record, and multilingual content.
The Video-MME (long, no subtitles) paper is available at https://arxiv.org/abs/2405.21075. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The Video-MME (long, no subtitles) leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, GPT-4.1 by OpenAI leads with a score of 0.720. The average score across all models is 0.720.
The highest Video-MME (long, no subtitles) score is 0.720, achieved by GPT-4.1 from OpenAI.
1 models have been evaluated on the Video-MME (long, no subtitles) benchmark, with 0 verified results and 1 self-reported results.
Video-MME (long, no subtitles) is categorized under multimodal, video, and vision. The benchmark evaluates multimodal models with multilingual support.