FACTS Grounding Leaderboard

Progress Over Time

Interactive timeline showing model performance evolution on FACTS Grounding

State-of-the-art frontier

Open

Proprietary

FACTS Grounding Leaderboard

13 models

			Context	Cost
1	Gemini 2.5 Pro Preview 06-05 Google	—	1.0M	$1.25 / $10.00
2	Gemini 2.5 Flash Google	—	1.0M	$0.30 / $2.50
3	Gemini 2.5 Flash-Lite Google	—	1.0M	$0.10 / $0.40
4	Gemini 2.0 Flash Google	—	1.0M	$0.10 / $0.40
4	Gemini 2.0 Flash-Lite Google	—	1.0M	$0.07 / $0.30
6	Gemma 3 12B Google	12B	131K	$0.05 / $0.10
7	Gemma 3 27B Google	27B	131K	$0.10 / $0.20
8	Gemini 3 Pro Google	—	—	—
9	Gemma 3 4B Google	4B	131K	$0.02 / $0.04
10	Gemini 3 Flash Google	—	1.0M	$0.50 / $3.00
11	GLM-5V-Turbo Zhipu AI	—	—	—
12	Gemini 3.1 Flash-Lite Google	—	1.0M	$0.25 / $1.50
13	Gemma 3 1B Google	1B	—	—

FAQ

Common questions about FACTS Grounding

A benchmark evaluating language models' ability to generate factually accurate and well-grounded responses based on long-form input context, comprising 1,719 examples with documents up to 32k tokens requiring detailed responses that are fully grounded in provided documents

The FACTS Grounding paper is available at https://arxiv.org/abs/2501.03200. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.

The FACTS Grounding leaderboard ranks 13 AI models based on their performance on this benchmark. Currently, Gemini 2.5 Pro Preview 06-05 by Google leads with a score of 0.878. The average score across all models is 0.702.

The highest FACTS Grounding score is 0.878, achieved by Gemini 2.5 Pro Preview 06-05 from Google.

13 models have been evaluated on the FACTS Grounding benchmark, with 0 verified results and 13 self-reported results.

FACTS Grounding is categorized under factuality, grounding, and reasoning. The benchmark evaluates text models.

FACTS Grounding

Progress Over Time

FACTS Grounding Leaderboard

FAQ

What is the FACTS Grounding benchmark?

Where can I find the FACTS Grounding paper?

What is the FACTS Grounding leaderboard?

What is the highest FACTS Grounding score?

How many models are evaluated on FACTS Grounding?

What categories does FACTS Grounding cover?