TydiQA Leaderboard

Progress Over Time

Interactive timeline showing model performance evolution on TydiQA

State-of-the-art frontier

Open

Proprietary

TydiQA Leaderboard

2 models

				Context	Cost	License
1	Llama 4 Maverick Meta		400B	1.0M	$0.17 / $0.85
2	Llama 4 Scout Meta		109B	10.0M	$0.08 / $0.30

FAQ

Common questions about TydiQA

A multilingual question answering benchmark covering 11 typologically diverse languages with 204K question-answer pairs. Questions are written by people seeking genuine information and data is collected directly in each language without translation to test model generalization across diverse linguistic structures.

The TydiQA paper is available at https://arxiv.org/abs/2003.05002. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.

The TydiQA leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Llama 4 Maverick by Meta leads with a score of 0.317. The average score across all models is 0.316.

The highest TydiQA score is 0.317, achieved by Llama 4 Maverick from Meta.

2 models have been evaluated on the TydiQA benchmark, with 0 verified results and 2 self-reported results.

TydiQA is categorized under language and reasoning. The benchmark evaluates text models with multilingual support.

TydiQA

Progress Over Time

TydiQA Leaderboard

FAQ

What is the TydiQA benchmark?

Where can I find the TydiQA paper?

What is the TydiQA leaderboard?

What is the highest TydiQA score?

How many models are evaluated on TydiQA?

What categories does TydiQA cover?