OCRBench

OCRBench: Comprehensive evaluation benchmark for assessing Optical Character Recognition (OCR) capabilities in Large Multimodal Models across text recognition, scene text VQA, and document understanding tasks

Paper

Progress Over Time

Interactive timeline showing model performance evolution on OCRBench

State-of-the-art frontier
Open
Proprietary

OCRBench Leaderboard

21 models
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B262K$0.30 / $1.49
2
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B262K$0.20 / $0.70
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.08 / $0.50
4
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $0.60
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B262K$0.45 / $3.49
7
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
8
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B262K$0.20 / $1.00
9
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.18 / $2.09
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $1.00
11
Moonshot AI
Moonshot AI
1.0T262K$0.60 / $2.50
12
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B262K$0.40 / $3.20
13
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B262K$0.25 / $2.00
14
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B
15
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
72B
16
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
73B
17
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
8B
186B128K$0.05 / $0.10
1916B
20
DeepSeek
DeepSeek
27B129K
213B
Notice missing or incorrect data?

FAQ

Common questions about OCRBench

OCRBench: Comprehensive evaluation benchmark for assessing Optical Character Recognition (OCR) capabilities in Large Multimodal Models across text recognition, scene text VQA, and document understanding tasks
The OCRBench paper is available at https://arxiv.org/abs/2305.07895. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The OCRBench leaderboard ranks 21 AI models based on their performance on this benchmark. Currently, Qwen3 VL 235B A22B Instruct by Alibaba Cloud / Qwen Team leads with a score of 920.000. The average score across all models is 414.313.
The highest OCRBench score is 920.000, achieved by Qwen3 VL 235B A22B Instruct from Alibaba Cloud / Qwen Team.
21 models have been evaluated on the OCRBench benchmark, with 0 verified results and 21 self-reported results.
OCRBench is categorized under image to text and vision. The benchmark evaluates multimodal models.