CC-OCR

A comprehensive OCR benchmark for evaluating Large Multimodal Models (LMMs) in literacy. Comprises four OCR-centric tracks: multi-scene text reading, multilingual text reading, document parsing, and key information extraction. Contains 39 subsets with 7,058 fully annotated images, 41% sourced from real applications. Tests capabilities including text grounding, multi-orientation text recognition, and detecting hallucination/repetition across diverse visual challenges.

PaperImplementation

Progress Over Time

Interactive timeline showing model performance evolution on CC-OCR

State-of-the-art frontier
Open
Proprietary

CC-OCR Leaderboard

15 models • 0 verified
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B262K
$0.30
$1.49
2
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B262K
$0.40
$3.20
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B262K
$0.45
$3.49
4
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B262K
$0.20
$0.70
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B262K
$0.25
$2.00
7
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
8
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K
$0.08
$0.50
9
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
72B
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
8B
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B262K
$0.20
$1.00
12
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
34B
13
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K
$0.18
$2.09
14
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K
$0.10
$0.60
15
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K
$0.10
$1.00
Notice missing or incorrect data?

FAQ

Common questions about CC-OCR

A comprehensive OCR benchmark for evaluating Large Multimodal Models (LMMs) in literacy. Comprises four OCR-centric tracks: multi-scene text reading, multilingual text reading, document parsing, and key information extraction. Contains 39 subsets with 7,058 fully annotated images, 41% sourced from real applications. Tests capabilities including text grounding, multi-orientation text recognition, and detecting hallucination/repetition across diverse visual challenges.
The CC-OCR paper is available at https://arxiv.org/abs/2412.02210. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The CC-OCR leaderboard ranks 15 AI models based on their performance on this benchmark. Currently, Qwen3 VL 235B A22B Instruct by Alibaba Cloud / Qwen Team leads with a score of 0.822. The average score across all models is 0.791.
The highest CC-OCR score is 0.822, achieved by Qwen3 VL 235B A22B Instruct from Alibaba Cloud / Qwen Team.
15 models have been evaluated on the CC-OCR benchmark, with 0 verified results and 15 self-reported results.
CC-OCR is categorized under multimodal, structured output, text-to-image, and vision. The benchmark evaluates multimodal models with multilingual support.