TriviaQA
A large-scale reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents (six per question on average) that provide high quality distant supervision for answering the questions. The dataset features relatively complex, compositional questions with considerable syntactic and lexical variability, requiring cross-sentence reasoning to find answers.
Progress Over Time
Interactive timeline showing model performance evolution on TriviaQA
State-of-the-art frontier
Open
Proprietary
TriviaQA Leaderboard
17 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Moonshot AI | 1.0T | — | — | ||
| 2 | Google | 27B | — | — | ||
| 3 | Mistral AI | 24B | — | — | ||
| 3 | Mistral AI | 24B | 128K | $0.10 / $0.30 | ||
| 5 | Mistral AI | 24B | — | — | ||
| 6 | 8B | — | — | |||
| 7 | Google | 9B | — | — | ||
| 8 | Mistral AI | 675B | 128K | $2.00 / $5.00 | ||
| 8 | Mistral AI | 14B | — | — | ||
| 10 | Mistral AI | 12B | 128K | $0.15 / $0.15 | ||
| 11 | 2B | — | — | |||
| 11 | Google | 8B | — | — | ||
| 13 | Mistral AI | 8B | — | — | ||
| 14 | Mistral AI | 8B | 128K | $0.10 / $0.10 | ||
| 15 | Google | 8B | — | — | ||
| 15 | 2B | — | — | |||
| 17 | Mistral AI | 3B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about TriviaQA
A large-scale reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents (six per question on average) that provide high quality distant supervision for answering the questions. The dataset features relatively complex, compositional questions with considerable syntactic and lexical variability, requiring cross-sentence reasoning to find answers.
The TriviaQA paper is available at https://arxiv.org/abs/1705.03551. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The TriviaQA leaderboard ranks 17 AI models based on their performance on this benchmark. Currently, Kimi K2 Base by Moonshot AI leads with a score of 0.851. The average score across all models is 0.731.
The highest TriviaQA score is 0.851, achieved by Kimi K2 Base from Moonshot AI.
17 models have been evaluated on the TriviaQA benchmark, with 0 verified results and 17 self-reported results.
TriviaQA is categorized under general and reasoning. The benchmark evaluates text models.