FACTS Grounding
A benchmark evaluating language models' ability to generate factually accurate and well-grounded responses based on long-form input context, comprising 1,719 examples with documents up to 32k tokens requiring detailed responses that are fully grounded in provided documents
Progress Over Time
Interactive timeline showing model performance evolution on FACTS Grounding
State-of-the-art frontier
Open
Proprietary
FACTS Grounding Leaderboard
13 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | — | 1.0M | $1.25 / $10.00 | |||
| 2 | Google | — | 1.0M | $0.30 / $2.50 | ||
| 3 | Google | — | 1.0M | $0.10 / $0.40 | ||
| 4 | Google | — | 1.0M | $0.10 / $0.40 | ||
| 4 | Google | — | 1.0M | $0.07 / $0.30 | ||
| 6 | Google | 12B | 131K | $0.05 / $0.10 | ||
| 7 | Google | 27B | 131K | $0.10 / $0.20 | ||
| 8 | Google | — | — | — | ||
| 9 | Google | 4B | 131K | $0.02 / $0.04 | ||
| 10 | Google | — | 1.0M | $0.50 / $3.00 | ||
| 11 | Zhipu AI | — | — | — | ||
| 12 | Google | — | 1.0M | $0.25 / $1.50 | ||
| 13 | Google | 1B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about FACTS Grounding
A benchmark evaluating language models' ability to generate factually accurate and well-grounded responses based on long-form input context, comprising 1,719 examples with documents up to 32k tokens requiring detailed responses that are fully grounded in provided documents
The FACTS Grounding paper is available at https://arxiv.org/abs/2501.03200. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The FACTS Grounding leaderboard ranks 13 AI models based on their performance on this benchmark. Currently, Gemini 2.5 Pro Preview 06-05 by Google leads with a score of 0.878. The average score across all models is 0.702.
The highest FACTS Grounding score is 0.878, achieved by Gemini 2.5 Pro Preview 06-05 from Google.
13 models have been evaluated on the FACTS Grounding benchmark, with 0 verified results and 13 self-reported results.
FACTS Grounding is categorized under factuality, grounding, and reasoning. The benchmark evaluates text models.