CC-OCR
Progress Over Time
Interactive timeline showing model performance evolution on CC-OCR
CC-OCR Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Alibaba Cloud / Qwen Team | — | 1.0M | $0.50 / $3.00 | ||
| 2 | Alibaba Cloud / Qwen Team | 236B | — | — | ||
| 3 | Alibaba Cloud / Qwen Team | 35B | — | — | ||
| 4 | Alibaba Cloud / Qwen Team | 122B | — | — | ||
| 5 | Alibaba Cloud / Qwen Team | 236B | — | — | ||
| 6 | Alibaba Cloud / Qwen Team | 28B | 262K | $0.60 / $3.60 | ||
| 7 | Alibaba Cloud / Qwen Team | 27B | 262K | $0.30 / $2.40 | ||
| 8 | Alibaba Cloud / Qwen Team | 35B | — | — | ||
| 8 | Alibaba Cloud / Qwen Team | 31B | — | — | ||
| 10 | Alibaba Cloud / Qwen Team | 33B | — | — | ||
| 11 | Alibaba Cloud / Qwen Team | 9B | — | — | ||
| 12 | Alibaba Cloud / Qwen Team | 72B | — | — | ||
| 13 | Alibaba Cloud / Qwen Team | 31B | — | — | ||
| 13 | Alibaba Cloud / Qwen Team | 8B | — | — | ||
| 15 | Alibaba Cloud / Qwen Team | 34B | — | — | ||
| 16 | Alibaba Cloud / Qwen Team | 9B | 262K | $0.18 / $2.09 | ||
| 17 | Alibaba Cloud / Qwen Team | 4B | 262K | $0.10 / $0.60 | ||
| 18 | Alibaba Cloud / Qwen Team | 4B | 262K | $0.10 / $1.00 |
What is CC-OCR?
A comprehensive OCR benchmark for evaluating Large Multimodal Models (LMMs) in literacy. Comprises four OCR-centric tracks: multi-scene text reading, multilingual text reading, document parsing, and key information extraction. Contains 39 subsets with 7,058 fully annotated images, 41% sourced from real applications. Tests capabilities including text grounding, multi-orientation text recognition, and detecting hallucination/repetition across diverse visual challenges.
CC-OCR is a multimodal benchmark evaluating models on multimodal, structured output, text-to-image, and vision tasks. LLM Stats tracks 18 models on this benchmark, scored on a 0–1 scale. The current average is 0.8, with the leader at 0.8.
Compare leaders on the best AI for multimodal, best AI for structured output, best AI for text-to-image and best AI for vision leaderboards.
Current leaders
Qwen3.6 Plus from Alibaba Cloud / Qwen Team currently leads the CC-OCR leaderboard with a score of 0.834 across 18 evaluated AI models.
Source paper
- Title
- CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy
- Authors
- Zhibo Yang, Jun Tang, Zhaohai Li, Pengfei Wang, and 8 others
- Published
- arXiv
- 2412.02210
Abstract
Large Multimodal Models (LMMs) have demonstrated impressive performance in recognizing document images with natural language instructions. However, it remains unclear to what extent capabilities in literacy with rich structure and fine-grained visual challenges. The current landscape lacks a comprehensive benchmark to effectively measure the literate capabilities of LMMs. Existing benchmarks are often limited by narrow scenarios and specified tasks. To this end, we introduce CC-OCR, a comprehensive benchmark that possesses a diverse range of scenarios, tasks, and challenges. CC-OCR comprises four OCR-centric tracks: multi-scene text reading, multilingual text reading, document parsing, and key information extraction. It includes 39 subsets with 7,058 full annotated images, of which 41% are sourced from real applications, and released for the first time. We evaluate nine prominent LMMs and reveal both the strengths and weaknesses of these models, particularly in text grounding, multi-orientation, and hallucination of repetition. CC-OCR aims to comprehensively evaluate the capabilities of LMMs on OCR-centered tasks, facilitating continued progress in this crucial area.
FAQ
Common questions about the CC-OCR benchmark and leaderboard.