OCRBench_V2

Paper

Progress Over Time

Interactive timeline showing model performance evolution on OCRBench_V2

State-of-the-art frontier
Open
Proprietary

OCRBench_V2 Leaderboard

7 models
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$0.32 / $1.28
2
3
ByteDance
ByteDance
4
5
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
7B
71.0M$0.30 / $2.50
Notice missing or incorrect data?
About this benchmark

What is OCRBench_V2?

OCRBench v2: Enhanced large-scale bilingual benchmark for evaluating Large Multimodal Models on visual text localization and reasoning with 10,000 human-verified question-answering pairs across 8 core OCR capabilities

OCRBench_V2 is a multimodal benchmark evaluating models on image to text and vision tasks. LLM Stats tracks 7 models on this benchmark, scored on a 0–1 scale. The current average is 0.6, with the leader at 0.7.

Compare leaders on the best AI for image to text and best AI for vision leaderboards.

Current leaders

Qwen3.7-Plus from Alibaba Cloud / Qwen Team currently leads the OCRBench_V2 leaderboard with a score of 0.671 across 7 evaluated AI models.

1Qwen3.7-PlusAlibaba Cloud / Qwen Team67.1%
2Nova 2 ProAmazon64.5%
3Seed 2.1 ProByteDance63.2%
OSSQwen2.5-Omni-7B#6 open-weight57.8%

Source paper

Title
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
Authors
Ling Fu, Zhebin Kuang, Jiajun Song, Mingxin Huang, and 20 others
Published
Abstract

Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilities in certain challenging tasks, such as text localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4x more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios), and thorough evaluation metrics, with 10,000 human-verified question-answering pairs and a high proportion of difficult samples. Moreover, we construct a private test set with 1,500 manually annotated images. The consistent evaluation trends observed across both public and private test sets validate the OCRBench v2's reliability. After carefully benchmarking state-of-the-art LMMs, we find that most LMMs score below 50 (100 in total) and suffer from five-type limitations, including less frequently encountered text recognition, fine-grained perception, layout perception, complex element parsing, and logical reasoning. The project website is at: https://99franklin.github.io/ocrbench_v2/

FAQ

Common questions about the OCRBench_V2 benchmark and leaderboard.

What is the OCRBench_V2 benchmark?

OCRBench v2: Enhanced large-scale bilingual benchmark for evaluating Large Multimodal Models on visual text localization and reasoning with 10,000 human-verified question-answering pairs across 8 core OCR capabilities

What is the OCRBench_V2 leaderboard?

The OCRBench_V2 leaderboard ranks 7 AI models based on their performance on this benchmark. Currently, Qwen3.7-Plus by Alibaba Cloud / Qwen Team leads with a score of 0.671. The average score across all models is 0.614.

What is the highest OCRBench_V2 score?

The highest OCRBench_V2 score is 0.671, achieved by Qwen3.7-Plus from Alibaba Cloud / Qwen Team.

How many models are evaluated on OCRBench_V2?

7 models have been evaluated on the OCRBench_V2 benchmark, with 0 verified results and 7 self-reported results.

Where can I find the OCRBench_V2 paper?

The OCRBench_V2 paper is available at https://arxiv.org/abs/2501.00321. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does OCRBench_V2 cover?

OCRBench_V2 is categorized under image to text and vision. The benchmark evaluates multimodal models with multilingual support.

What is the best open-source model on OCRBench_V2?

Qwen2.5-Omni-7B by Alibaba Cloud / Qwen Team is the top-ranked open-source model on OCRBench_V2, with a score of 0.578 (rank #6).

Which model offers the best value on OCRBench_V2?

Among models scoring within 10% of the leader, Qwen3.7-Plus from Alibaba Cloud / Qwen Team is the cheapest, at $0.32 per million input tokens with a score of 0.671.

How recent are the OCRBench_V2 leaderboard results?

The OCRBench_V2 leaderboard was last updated in July 2026 and currently includes 7 evaluated models.