Natural Questions
Natural Questions is a question answering dataset featuring real anonymized queries issued to Google search engine. It contains 307,373 training examples where annotators provide long answers (passages) and short answers (entities) from Wikipedia pages, or mark them as unanswerable.
Progress Over Time
Interactive timeline showing model performance evolution on Natural Questions
State-of-the-art frontier
Open
Proprietary
Natural Questions Leaderboard
7 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Google | 27B | — | — | ||
| 2 | Mistral AI | 12B | 128K | $0.15 / $0.15 | ||
| 3 | Google | 9B | — | — | ||
| 4 | 2B | — | — | |||
| 4 | Google | 8B | — | — | ||
| 6 | 2B | — | — | |||
| 6 | Google | 8B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about Natural Questions
Natural Questions is a question answering dataset featuring real anonymized queries issued to Google search engine. It contains 307,373 training examples where annotators provide long answers (passages) and short answers (entities) from Wikipedia pages, or mark them as unanswerable.
The Natural Questions paper is available at https://arxiv.org/abs/1901.08634. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The Natural Questions leaderboard ranks 7 AI models based on their performance on this benchmark. Currently, Gemma 2 27B by Google leads with a score of 0.345. The average score across all models is 0.240.
The highest Natural Questions score is 0.345, achieved by Gemma 2 27B from Google.
7 models have been evaluated on the Natural Questions benchmark, with 0 verified results and 7 self-reported results.
Natural Questions is categorized under general, reasoning, and search. The benchmark evaluates text models.