DocVQA

A dataset for Visual Question Answering on document images containing 50,000 questions defined on 12,000+ document images. The benchmark tests AI's ability to understand document structure and content, requiring models to comprehend document layout and perform information retrieval to answer questions about document images.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on DocVQA

State-of-the-art frontier
Open
Proprietary

DocVQA Leaderboard

26 models • 0 verified
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.96472B
2
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.9578B
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.9527B
3
0.952200K
$3.00
$15.00
5
0.94924B
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
0.94834B
7
0.944400B1.0M
$0.17
$0.60
7
0.944109B10.0M
$0.08
$0.30
9
0.936128K
$2.00
$10.00
10
Amazon
Amazon
0.935300K
$0.80
$3.20
11
DeepSeek
DeepSeek
0.93327B129K
11
Mistral AI
Mistral AI
0.933124B128K
$2.00
$6.00
13
0.9326B128K
$0.05
$0.10
13
0.932
15
OpenAI
OpenAI
0.928128K
$2.50
$10.00
16
Amazon
Amazon
0.924300K
$0.06
$0.24
17
0.92316B
18
Mistral AI
Mistral AI
0.90712B128K
$0.15
$0.15
19
0.90190B128K
$0.35
$0.40
20
0.8893B
21
0.88411B128K
$0.05
$0.05
22
0.87112B131K
$0.05
$0.10
23
0.86627B131K
$0.10
$0.20
24
0.856
24
0.856
26
0.7584B131K
$0.02
$0.04
Notice missing or incorrect data?Start an Issue discussion

FAQ

Common questions about DocVQA

A dataset for Visual Question Answering on document images containing 50,000 questions defined on 12,000+ document images. The benchmark tests AI's ability to understand document structure and content, requiring models to comprehend document layout and perform information retrieval to answer questions about document images.
The DocVQA paper is available at https://arxiv.org/abs/2007.00398. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The DocVQA leaderboard ranks 26 AI models based on their performance on this benchmark. Currently, Qwen2.5 VL 72B Instruct by Alibaba Cloud / Qwen Team leads with a score of 0.964. The average score across all models is 0.914.
The highest DocVQA score is 0.964, achieved by Qwen2.5 VL 72B Instruct from Alibaba Cloud / Qwen Team.
26 models have been evaluated on the DocVQA benchmark, with 0 verified results and 26 self-reported results.
DocVQA is categorized under image to text, multimodal, and vision. The benchmark evaluates multimodal models.