What is the DeepSearchQA leaderboard?

The DeepSearchQA leaderboard ranks 5 AI models based on their performance on this benchmark. Currently, Claude Opus 4.6 by Anthropic leads with a score of 0.913. The average score across all models is 0.826.

What is the highest DeepSearchQA score?

The highest DeepSearchQA score is 0.913, achieved by Claude Opus 4.6 from Anthropic.

How many models are evaluated on DeepSearchQA?

5 models have been evaluated on the DeepSearchQA benchmark, with 0 verified results and 5 self-reported results.

What categories does DeepSearchQA cover?

DeepSearchQA is categorized under agents, reasoning, and search. The benchmark evaluates text models.

All benchmarks

DeepSearchQA

DeepSearchQA is a benchmark for evaluating deep search and question-answering capabilities, testing models' ability to perform multi-hop reasoning and information retrieval across complex knowledge domains.

Claude Opus 4.6 from Anthropic currently leads the DeepSearchQA leaderboard with a score of 0.913 across 5 evaluated AI models.