TydiQA
A multilingual question answering benchmark covering 11 typologically diverse languages with 204K question-answer pairs. Questions are written by people seeking genuine information and data is collected directly in each language without translation to test model generalization across diverse linguistic structures.
Progress Over Time
Interactive timeline showing model performance evolution on TydiQA
State-of-the-art frontier
Open
Proprietary
TydiQA Leaderboard
2 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Meta | 400B | 1.0M | $0.17 / $0.85 | ||
| 2 | Meta | 109B | 10.0M | $0.08 / $0.30 |
Notice missing or incorrect data?
FAQ
Common questions about TydiQA
A multilingual question answering benchmark covering 11 typologically diverse languages with 204K question-answer pairs. Questions are written by people seeking genuine information and data is collected directly in each language without translation to test model generalization across diverse linguistic structures.
The TydiQA paper is available at https://arxiv.org/abs/2003.05002. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The TydiQA leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Llama 4 Maverick by Meta leads with a score of 0.317. The average score across all models is 0.316.
The highest TydiQA score is 0.317, achieved by Llama 4 Maverick from Meta.
2 models have been evaluated on the TydiQA benchmark, with 0 verified results and 2 self-reported results.
TydiQA is categorized under language and reasoning. The benchmark evaluates text models with multilingual support.