Benchmarks/general/Natural Questions

Natural Questions

Natural Questions is a question answering dataset featuring real anonymized queries issued to Google search engine. It contains 307,373 training examples where annotators provide long answers (passages) and short answers (entities) from Wikipedia pages, or mark them as unanswerable.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on Natural Questions

State-of-the-art frontier
Open
Proprietary

Natural Questions Leaderboard

7 models
ContextCostLicense
127B
212B128K$0.15 / $0.15
39B
42B
48B
62B
68B
Notice missing or incorrect data?

FAQ

Common questions about Natural Questions

Natural Questions is a question answering dataset featuring real anonymized queries issued to Google search engine. It contains 307,373 training examples where annotators provide long answers (passages) and short answers (entities) from Wikipedia pages, or mark them as unanswerable.
The Natural Questions paper is available at https://arxiv.org/abs/1901.08634. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The Natural Questions leaderboard ranks 7 AI models based on their performance on this benchmark. Currently, Gemma 2 27B by Google leads with a score of 0.345. The average score across all models is 0.240.
The highest Natural Questions score is 0.345, achieved by Gemma 2 27B from Google.
7 models have been evaluated on the Natural Questions benchmark, with 0 verified results and 7 self-reported results.
Natural Questions is categorized under general, reasoning, and search. The benchmark evaluates text models.