DUDE
Progress Over Time
Interactive timeline showing model performance evolution on DUDE
State-of-the-art frontier
Open
Proprietary
DUDE Leaderboard
2 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | ByteDance | — | — | — | ||
| 2 | Seed 2.1 ProNew ByteDance | — | — | — |
Notice missing or incorrect data?
What is DUDE?
DUDE (Document Understanding Dataset and Evaluation) tests multi-page, multi-domain document understanding and reasoning.
DUDE is a multimodal benchmark evaluating models on multimodal, long context, and vision tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.8, with the leader at 0.8.
Compare leaders on the best AI for multimodal, best AI for long context and best AI for vision leaderboards.
Current leaders
Seed 2.1 Turbo from ByteDance currently leads the DUDE leaderboard with a score of 0.831 across 2 evaluated AI models.
FAQ
Common questions about the DUDE benchmark and leaderboard.