CRAG

CRAG (Comprehensive RAG Benchmark) is a factual question answering benchmark consisting of 4,409 question-answer pairs across 5 domains (finance, sports, music, movie, open domain) and 8 question categories. The benchmark includes mock APIs to simulate web and Knowledge Graph search, designed to represent the diverse and dynamic nature of real-world QA tasks with temporal dynamism ranging from years to seconds. It evaluates retrieval-augmented generation systems for trustworthy question answering.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on CRAG

State-of-the-art frontier
Open
Proprietary

CRAG Leaderboard

3 models • 0 verified
ContextCostLicense
1
Amazon
Amazon
300K
$0.80
$3.20
2
Amazon
Amazon
300K
$0.06
$0.24
3
128K
$0.03
$0.14
Notice missing or incorrect data?

FAQ

Common questions about CRAG

CRAG (Comprehensive RAG Benchmark) is a factual question answering benchmark consisting of 4,409 question-answer pairs across 5 domains (finance, sports, music, movie, open domain) and 8 question categories. The benchmark includes mock APIs to simulate web and Knowledge Graph search, designed to represent the diverse and dynamic nature of real-world QA tasks with temporal dynamism ranging from years to seconds. It evaluates retrieval-augmented generation systems for trustworthy question answering.
The CRAG paper is available at https://arxiv.org/abs/2406.04744. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The CRAG leaderboard ranks 3 AI models based on their performance on this benchmark. Currently, Nova Pro by Amazon leads with a score of 0.503. The average score across all models is 0.457.
The highest CRAG score is 0.503, achieved by Nova Pro from Amazon.
3 models have been evaluated on the CRAG benchmark, with 0 verified results and 3 self-reported results.
CRAG is categorized under economics, finance, reasoning, and search. The benchmark evaluates text models.