NQ
Natural Questions (NQ) benchmark containing real user questions issued to Google search with answers found from Wikipedia, designed for training and evaluation of automatic question answering systems
Progress Over Time
Interactive timeline showing model performance evolution on NQ
State-of-the-art frontier
Open
Proprietary
NQ Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | 8B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about NQ
Natural Questions (NQ) benchmark containing real user questions issued to Google search with answers found from Wikipedia, designed for training and evaluation of automatic question answering systems
The NQ paper is available at https://aclanthology.org/Q19-1026/. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The NQ leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Granite 3.3 8B Base by IBM leads with a score of 0.365. The average score across all models is 0.365.
The highest NQ score is 0.365, achieved by Granite 3.3 8B Base from IBM.
1 models have been evaluated on the NQ benchmark, with 0 verified results and 1 self-reported results.
NQ is categorized under general and reasoning. The benchmark evaluates text models.