OCRBench
OCRBench: Comprehensive evaluation benchmark for assessing Optical Character Recognition (OCR) capabilities in Large Multimodal Models across text recognition, scene text VQA, and document understanding tasks
Progress Over Time
Interactive timeline showing model performance evolution on OCRBench
State-of-the-art frontier
Open
Proprietary
OCRBench Leaderboard
21 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Alibaba Cloud / Qwen Team | 236B | 262K | $0.30 / $1.49 | ||
| 2 | Alibaba Cloud / Qwen Team | 31B | 262K | $0.20 / $0.70 | ||
| 3 | Alibaba Cloud / Qwen Team | 9B | 262K | $0.08 / $0.50 | ||
| 4 | Alibaba Cloud / Qwen Team | 33B | — | — | ||
| 5 | Alibaba Cloud / Qwen Team | 4B | 262K | $0.10 / $0.60 | ||
| 6 | Alibaba Cloud / Qwen Team | 236B | 262K | $0.45 / $3.49 | ||
| 7 | Alibaba Cloud / Qwen Team | 33B | — | — | ||
| 8 | Alibaba Cloud / Qwen Team | 31B | 262K | $0.20 / $1.00 | ||
| 9 | Alibaba Cloud / Qwen Team | 9B | 262K | $0.18 / $2.09 | ||
| 10 | Alibaba Cloud / Qwen Team | 4B | 262K | $0.10 / $1.00 | ||
| 11 | Moonshot AI | 1.0T | 262K | $0.60 / $2.50 | ||
| 12 | Alibaba Cloud / Qwen Team | 122B | 262K | $0.40 / $3.20 | ||
| 13 | Alibaba Cloud / Qwen Team | 35B | 262K | $0.25 / $2.00 | ||
| 14 | Alibaba Cloud / Qwen Team | 27B | — | — | ||
| 15 | Alibaba Cloud / Qwen Team | 72B | — | — | ||
| 16 | Alibaba Cloud / Qwen Team | 73B | — | — | ||
| 17 | Alibaba Cloud / Qwen Team | 8B | — | — | ||
| 18 | Microsoft | 6B | 128K | $0.05 / $0.10 | ||
| 19 | DeepSeek | 16B | — | — | ||
| 20 | DeepSeek | 27B | 129K | — | ||
| 21 | DeepSeek | 3B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about OCRBench
OCRBench: Comprehensive evaluation benchmark for assessing Optical Character Recognition (OCR) capabilities in Large Multimodal Models across text recognition, scene text VQA, and document understanding tasks
The OCRBench paper is available at https://arxiv.org/abs/2305.07895. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The OCRBench leaderboard ranks 21 AI models based on their performance on this benchmark. Currently, Qwen3 VL 235B A22B Instruct by Alibaba Cloud / Qwen Team leads with a score of 920.000. The average score across all models is 414.313.
The highest OCRBench score is 920.000, achieved by Qwen3 VL 235B A22B Instruct from Alibaba Cloud / Qwen Team.
21 models have been evaluated on the OCRBench benchmark, with 0 verified results and 21 self-reported results.
OCRBench is categorized under image to text and vision. The benchmark evaluates multimodal models.