DocVQA
A dataset for Visual Question Answering on document images containing 50,000 questions defined on 12,000+ document images. The benchmark tests AI's ability to understand document structure and content, requiring models to comprehend document layout and perform information retrieval to answer questions about document images.
Progress Over Time
Interactive timeline showing model performance evolution on DocVQA
State-of-the-art frontier
Open
Proprietary
DocVQA Leaderboard
26 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | Alibaba Cloud / Qwen Team | 0.964 | 72B | — | — | |
2 | Alibaba Cloud / Qwen Team | 0.957 | 8B | — | — | |
3 | Alibaba Cloud / Qwen Team | 0.952 | 7B | — | — | |
3 | Anthropic | 0.952 | — | 200K | $3.00 $15.00 | |
5 | Mistral AI | 0.949 | 24B | — | — | |
6 | Alibaba Cloud / Qwen Team | 0.948 | 34B | — | — | |
7 | Meta | 0.944 | 400B | 1.0M | $0.17 $0.60 | |
7 | Meta | 0.944 | 109B | 10.0M | $0.08 $0.30 | |
9 | xAI | 0.936 | — | 128K | $2.00 $10.00 | |
10 | Amazon | 0.935 | — | 300K | $0.80 $3.20 | |
11 | DeepSeek | 0.933 | 27B | 129K | — | |
11 | Mistral AI | 0.933 | 124B | 128K | $2.00 $6.00 | |
13 | Microsoft | 0.932 | 6B | 128K | $0.05 $0.10 | |
13 | xAI | 0.932 | — | — | — | |
15 | OpenAI | 0.928 | — | 128K | $2.50 $10.00 | |
16 | Amazon | 0.924 | — | 300K | $0.06 $0.24 | |
17 | DeepSeek | 0.923 | 16B | — | — | |
18 | Mistral AI | 0.907 | 12B | 128K | $0.15 $0.15 | |
19 | 0.901 | 90B | 128K | $0.35 $0.40 | ||
20 | DeepSeek | 0.889 | 3B | — | — | |
21 | 0.884 | 11B | 128K | $0.05 $0.05 | ||
22 | Google | 0.871 | 12B | 131K | $0.05 $0.10 | |
23 | Google | 0.866 | 27B | 131K | $0.10 $0.20 | |
24 | xAI | 0.856 | — | — | — | |
24 | xAI | 0.856 | — | — | — | |
26 | Google | 0.758 | 4B | 131K | $0.02 $0.04 |
Notice missing or incorrect data?Start an Issue discussion→
FAQ
Common questions about DocVQA
A dataset for Visual Question Answering on document images containing 50,000 questions defined on 12,000+ document images. The benchmark tests AI's ability to understand document structure and content, requiring models to comprehend document layout and perform information retrieval to answer questions about document images.
The DocVQA paper is available at https://arxiv.org/abs/2007.00398. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The DocVQA leaderboard ranks 26 AI models based on their performance on this benchmark. Currently, Qwen2.5 VL 72B Instruct by Alibaba Cloud / Qwen Team leads with a score of 0.964. The average score across all models is 0.914.
The highest DocVQA score is 0.964, achieved by Qwen2.5 VL 72B Instruct from Alibaba Cloud / Qwen Team.
26 models have been evaluated on the DocVQA benchmark, with 0 verified results and 26 self-reported results.
DocVQA is categorized under image to text, multimodal, and vision. The benchmark evaluates multimodal models.