OCRBench_V2
OCRBench v2: Enhanced large-scale bilingual benchmark for evaluating Large Multimodal Models on visual text localization and reasoning with 10,000 human-verified question-answering pairs across 8 core OCR capabilities
Progress Over Time
Interactive timeline showing model performance evolution on OCRBench_V2
State-of-the-art frontier
Open
Proprietary
OCRBench_V2 Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Alibaba Cloud / Qwen Team | 7B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about OCRBench_V2
OCRBench v2: Enhanced large-scale bilingual benchmark for evaluating Large Multimodal Models on visual text localization and reasoning with 10,000 human-verified question-answering pairs across 8 core OCR capabilities
The OCRBench_V2 paper is available at https://arxiv.org/abs/2501.00321. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The OCRBench_V2 leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Qwen2.5-Omni-7B by Alibaba Cloud / Qwen Team leads with a score of 0.578. The average score across all models is 0.578.
The highest OCRBench_V2 score is 0.578, achieved by Qwen2.5-Omni-7B from Alibaba Cloud / Qwen Team.
1 models have been evaluated on the OCRBench_V2 benchmark, with 0 verified results and 1 self-reported results.
OCRBench_V2 is categorized under image to text and vision. The benchmark evaluates multimodal models with multilingual support.