CC-OCR
A comprehensive OCR benchmark for evaluating Large Multimodal Models (LMMs) in literacy. Comprises four OCR-centric tracks: multi-scene text reading, multilingual text reading, document parsing, and key information extraction. Contains 39 subsets with 7,058 fully annotated images, 41% sourced from real applications. Tests capabilities including text grounding, multi-orientation text recognition, and detecting hallucination/repetition across diverse visual challenges.
Progress Over Time
Interactive timeline showing model performance evolution on CC-OCR
State-of-the-art frontier
Open
Proprietary
CC-OCR Leaderboard
15 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | Alibaba Cloud / Qwen Team | 236B | 262K | $0.30 $1.49 | ||
2 | Alibaba Cloud / Qwen Team | 122B | 262K | $0.40 $3.20 | ||
3 | Alibaba Cloud / Qwen Team | 236B | 262K | $0.45 $3.49 | ||
4 | Alibaba Cloud / Qwen Team | 27B | — | — | ||
5 | Alibaba Cloud / Qwen Team | 31B | 262K | $0.20 $0.70 | ||
5 | Alibaba Cloud / Qwen Team | 35B | 262K | $0.25 $2.00 | ||
7 | Alibaba Cloud / Qwen Team | 33B | — | — | ||
8 | Alibaba Cloud / Qwen Team | 9B | 262K | $0.08 $0.50 | ||
9 | Alibaba Cloud / Qwen Team | 72B | — | — | ||
10 | Alibaba Cloud / Qwen Team | 8B | — | — | ||
10 | Alibaba Cloud / Qwen Team | 31B | 262K | $0.20 $1.00 | ||
12 | Alibaba Cloud / Qwen Team | 34B | — | — | ||
13 | Alibaba Cloud / Qwen Team | 9B | 262K | $0.18 $2.09 | ||
14 | Alibaba Cloud / Qwen Team | 4B | 262K | $0.10 $0.60 | ||
15 | Alibaba Cloud / Qwen Team | 4B | 262K | $0.10 $1.00 |
Notice missing or incorrect data?
FAQ
Common questions about CC-OCR
A comprehensive OCR benchmark for evaluating Large Multimodal Models (LMMs) in literacy. Comprises four OCR-centric tracks: multi-scene text reading, multilingual text reading, document parsing, and key information extraction. Contains 39 subsets with 7,058 fully annotated images, 41% sourced from real applications. Tests capabilities including text grounding, multi-orientation text recognition, and detecting hallucination/repetition across diverse visual challenges.
The CC-OCR paper is available at https://arxiv.org/abs/2412.02210. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The CC-OCR dataset is available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery.
The CC-OCR leaderboard ranks 15 AI models based on their performance on this benchmark. Currently, Qwen3 VL 235B A22B Instruct by Alibaba Cloud / Qwen Team leads with a score of 0.822. The average score across all models is 0.791.
The highest CC-OCR score is 0.822, achieved by Qwen3 VL 235B A22B Instruct from Alibaba Cloud / Qwen Team.
15 models have been evaluated on the CC-OCR benchmark, with 0 verified results and 15 self-reported results.
CC-OCR is categorized under multimodal, structured output, text-to-image, and vision. The benchmark evaluates multimodal models with multilingual support.