InfoVQA
Progress Over Time
Interactive timeline showing model performance evolution on InfoVQA
InfoVQA Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Alibaba Cloud / Qwen Team | 34B | — | — | ||
| 2 | Alibaba Cloud / Qwen Team | 8B | — | — | ||
| 3 | DeepSeek | 27B | — | — | ||
| 4 | DeepSeek | 16B | — | — | ||
| 5 | Microsoft | 6B | — | — | ||
| 6 | Google | 27B | — | — | ||
| 7 | DeepSeek | 3B | — | — | ||
| 8 | Google | 12B | — | — | ||
| 9 | Google | 4B | — | — |
What is InfoVQA?
InfoVQA dataset with 30,000 questions and 5,000 infographic images requiring joint reasoning over document layout, textual content, graphical elements, and data visualizations with elementary reasoning and arithmetic skills
InfoVQA is a multimodal benchmark evaluating models on multimodal and vision tasks. LLM Stats tracks 9 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.8.
Compare leaders on the best AI for multimodal and best AI for vision leaderboards.
Current leaders
Qwen2.5 VL 32B Instruct from Alibaba Cloud / Qwen Team currently leads the InfoVQA leaderboard with a score of 0.834 across 9 evaluated AI models.
Source paper
- Title
- InfographicVQA
- Authors
- Minesh Mathew, Viraj Bagal, Rubèn Pérez Tito, Dimosthenis Karatzas, and 2 others
- Published
- arXiv
- 2104.12756
Abstract
Infographics are documents designed to effectively communicate information using a combination of textual, graphical and visual elements. In this work, we explore the automatic understanding of infographic images by using Visual Question Answering technique.To this end, we present InfographicVQA, a new dataset that comprises a diverse collection of infographics along with natural language questions and answers annotations. The collected questions require methods to jointly reason over the document layout, textual content, graphical elements, and data visualizations. We curate the dataset with emphasis on questions that require elementary reasoning and basic arithmetic skills. Finally, we evaluate two strong baselines based on state of the art multi-modal VQA models, and establish baseline performance for the new task. The dataset, code and leaderboard will be made available at http://docvqa.org
FAQ
Common questions about the InfoVQA benchmark and leaderboard.