OVOBench

Name: OVOBench Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Paper

Progress Over Time

Interactive timeline showing model performance evolution on OVOBench

State-of-the-art frontier

Open

Proprietary

OVOBench Leaderboard

2 models

				Context	Cost	License
1	Seed 2.1 ProNew ByteDance		—	—	—
2	Seed 2.1 TurboNew ByteDance		—	—	—

Notice missing or incorrect data?

About this benchmark

What is OVOBench?

OVOBench (Online Video Online Benchmark) evaluates streaming video understanding, testing a model's ability to perceive and respond to video content in real time as it unfolds.

OVOBench is a multimodal benchmark evaluating models on multimodal, video, and vision tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.8, with the leader at 0.8.

Compare leaders on the best AI for multimodal, best AI for video and best AI for vision leaderboards.

Current leaders

Seed 2.1 Pro from ByteDance currently leads the OVOBench leaderboard with a score of 0.807 across 2 evaluated AI models.

Seed 2.1 ProByteDance80.7%

Seed 2.1 TurboByteDance79.2%

Source paper

Title: OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Authors: Yifei Li, Junbo Niu, Ziyang Miao, Chunjiang Ge, and 11 others
Published: January 9, 2025
arXiv: 2501.05510

Abstract

Temporal Awareness, the ability to reason dynamically based on the timestamp when a question is raised, is the key distinction between offline and online video LLMs. Unlike offline models, which rely on complete videos for static, post hoc analysis, online models process video streams incrementally and dynamically adapt their responses based on the timestamp at which the question is posed. Despite its significance, temporal awareness has not been adequately evaluated in existing benchmarks. To fill this gap, we present OVO-Bench (Online-VideO-Benchmark), a novel video benchmark that emphasizes the importance of timestamps for advanced online video understanding capability benchmarking. OVO-Bench evaluates the ability of video LLMs to reason and respond to events occurring at specific timestamps under three distinct scenarios: (1) Backward tracing: trace back to past events to answer the question. (2) Real-time understanding: understand and respond to events as they unfold at the current timestamp. (3) Forward active responding: delay the response until sufficient future information becomes available to answer the question accurately. OVO-Bench comprises 12 tasks, featuring 644 unique videos and approximately human-curated 2,800 fine-grained meta-annotations with precise timestamps. We combine automated generation pipelines with human curation. With these high-quality samples, we further developed an evaluation pipeline to systematically query video LLMs along the video timeline. Evaluations of nine Video-LLMs reveal that, despite advancements on traditional benchmarks, current models struggle with online video understanding, showing a significant gap compared to human agents. We hope OVO-Bench will drive progress in video LLMs and inspire future research in online video reasoning. Our benchmark and code can be accessed at https://github.com/JoeLeelyf/OVO-Bench.

FAQ

Common questions about the OVOBench benchmark and leaderboard.

What is the OVOBench benchmark?

OVOBench (Online Video Online Benchmark) evaluates streaming video understanding, testing a model's ability to perceive and respond to video content in real time as it unfolds.

What is the OVOBench leaderboard?

The OVOBench leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Seed 2.1 Pro by ByteDance leads with a score of 0.807. The average score across all models is 0.800.

What is the highest OVOBench score?

The highest OVOBench score is 0.807, achieved by Seed 2.1 Pro from ByteDance.

How many models are evaluated on OVOBench?

2 models have been evaluated on the OVOBench benchmark, with 0 verified results and 2 self-reported results.

Where can I find the OVOBench paper?

The OVOBench paper is available at https://arxiv.org/abs/2501.05510. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does OVOBench cover?

OVOBench is categorized under multimodal, video, and vision. The benchmark evaluates multimodal models.

How recent are the OVOBench leaderboard results?

The OVOBench leaderboard was last updated in June 2026 and currently includes 2 evaluated models.