CC-OCR

PaperImplementation

Progress Over Time

Interactive timeline showing model performance evolution on CC-OCR

State-of-the-art frontier
Open
Proprietary

CC-OCR Leaderboard

18 models
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$0.50 / $3.00
2
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B
4
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
28B262K$0.60 / $3.60
7
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B262K$0.30 / $2.40
8
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B
8
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
11
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B
12
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
72B
13
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B
13
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
8B
15
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
34B
16
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.18 / $2.09
17
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $0.60
18
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $1.00
Notice missing or incorrect data?
About this benchmark

What is CC-OCR?

A comprehensive OCR benchmark for evaluating Large Multimodal Models (LMMs) in literacy. Comprises four OCR-centric tracks: multi-scene text reading, multilingual text reading, document parsing, and key information extraction. Contains 39 subsets with 7,058 fully annotated images, 41% sourced from real applications. Tests capabilities including text grounding, multi-orientation text recognition, and detecting hallucination/repetition across diverse visual challenges.

CC-OCR is a multimodal benchmark evaluating models on multimodal, structured output, text-to-image, and vision tasks. LLM Stats tracks 18 models on this benchmark, scored on a 0–1 scale. The current average is 0.8, with the leader at 0.8.

Compare leaders on the best AI for multimodal, best AI for structured output, best AI for text-to-image and best AI for vision leaderboards.

Current leaders

Qwen3.6 Plus from Alibaba Cloud / Qwen Team currently leads the CC-OCR leaderboard with a score of 0.834 across 18 evaluated AI models.

1Qwen3.6 PlusAlibaba Cloud / Qwen Team83.4%
2Qwen3 VL 235B A22B InstructAlibaba Cloud / Qwen Team82.2%
3Qwen3.6-35B-A3BAlibaba Cloud / Qwen Team81.9%

Source paper

Title
CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy
Authors
Zhibo Yang, Jun Tang, Zhaohai Li, Pengfei Wang, and 8 others
Published
Abstract

Large Multimodal Models (LMMs) have demonstrated impressive performance in recognizing document images with natural language instructions. However, it remains unclear to what extent capabilities in literacy with rich structure and fine-grained visual challenges. The current landscape lacks a comprehensive benchmark to effectively measure the literate capabilities of LMMs. Existing benchmarks are often limited by narrow scenarios and specified tasks. To this end, we introduce CC-OCR, a comprehensive benchmark that possesses a diverse range of scenarios, tasks, and challenges. CC-OCR comprises four OCR-centric tracks: multi-scene text reading, multilingual text reading, document parsing, and key information extraction. It includes 39 subsets with 7,058 full annotated images, of which 41% are sourced from real applications, and released for the first time. We evaluate nine prominent LMMs and reveal both the strengths and weaknesses of these models, particularly in text grounding, multi-orientation, and hallucination of repetition. CC-OCR aims to comprehensively evaluate the capabilities of LMMs on OCR-centered tasks, facilitating continued progress in this crucial area.

FAQ

Common questions about the CC-OCR benchmark and leaderboard.

What is the CC-OCR benchmark?

A comprehensive OCR benchmark for evaluating Large Multimodal Models (LMMs) in literacy. Comprises four OCR-centric tracks: multi-scene text reading, multilingual text reading, document parsing, and key information extraction. Contains 39 subsets with 7,058 fully annotated images, 41% sourced from real applications. Tests capabilities including text grounding, multi-orientation text recognition, and detecting hallucination/repetition across diverse visual challenges.

What is the CC-OCR leaderboard?

The CC-OCR leaderboard ranks 18 AI models based on their performance on this benchmark. Currently, Qwen3.6 Plus by Alibaba Cloud / Qwen Team leads with a score of 0.834. The average score across all models is 0.796.

What is the highest CC-OCR score?

The highest CC-OCR score is 0.834, achieved by Qwen3.6 Plus from Alibaba Cloud / Qwen Team.

How many models are evaluated on CC-OCR?

18 models have been evaluated on the CC-OCR benchmark, with 0 verified results and 18 self-reported results.

Where can I find the CC-OCR paper?

The CC-OCR paper is available at https://arxiv.org/abs/2412.02210. The paper details the methodology, dataset construction, and evaluation criteria.

Where can I find the CC-OCR dataset?

What categories does CC-OCR cover?

CC-OCR is categorized under multimodal, structured output, text-to-image, and vision. The benchmark evaluates multimodal models with multilingual support.

What is the best open-source model on CC-OCR?

Qwen3 VL 235B A22B Instruct by Alibaba Cloud / Qwen Team is the top-ranked open-source model on CC-OCR, with a score of 0.822 (rank #2).

Which model offers the best value on CC-OCR?

Among models scoring within 10% of the leader, Qwen3 VL 4B Instruct from Alibaba Cloud / Qwen Team is the cheapest, at $0.10 per million input tokens with a score of 0.762.

How recent are the CC-OCR leaderboard results?

The CC-OCR leaderboard was last updated in July 2026 and currently includes 18 evaluated models.