DUDE

Name: DUDE Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on DUDE

State-of-the-art frontier

Open

Proprietary

DUDE Leaderboard

2 models

				Context	Cost	License
1	Seed 2.1 TurboNew ByteDance		—	—	—
2	Seed 2.1 ProNew ByteDance		—	—	—

Notice missing or incorrect data?

About this benchmark

What is DUDE?

DUDE (Document Understanding Dataset and Evaluation) tests multi-page, multi-domain document understanding and reasoning.

DUDE is a multimodal benchmark evaluating models on multimodal, long context, and vision tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.8, with the leader at 0.8.

Compare leaders on the best AI for multimodal, best AI for long context and best AI for vision leaderboards.

Current leaders

Seed 2.1 Turbo from ByteDance currently leads the DUDE leaderboard with a score of 0.831 across 2 evaluated AI models.

Seed 2.1 TurboByteDance83.1%

Seed 2.1 ProByteDance82.8%

FAQ

Common questions about the DUDE benchmark and leaderboard.

What is the DUDE benchmark?

DUDE (Document Understanding Dataset and Evaluation) tests multi-page, multi-domain document understanding and reasoning.

What is the DUDE leaderboard?

The DUDE leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Seed 2.1 Turbo by ByteDance leads with a score of 0.831. The average score across all models is 0.829.

What is the highest DUDE score?

The highest DUDE score is 0.831, achieved by Seed 2.1 Turbo from ByteDance.

How many models are evaluated on DUDE?

2 models have been evaluated on the DUDE benchmark, with 0 verified results and 2 self-reported results.

What categories does DUDE cover?

DUDE is categorized under multimodal, long context, and vision. The benchmark evaluates multimodal models.

How recent are the DUDE leaderboard results?

The DUDE leaderboard was last updated in June 2026 and currently includes 2 evaluated models.