CorpusQA

Name: CorpusQA Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on CorpusQA

State-of-the-art frontier

Open

Proprietary

CorpusQA Leaderboard

1 models

				Context	Cost	License
1	MAI-Thinking-1 Microsoft		1.0T	—	—

Notice missing or incorrect data?

About this benchmark

What is CorpusQA?

CorpusQA is a multi-document, free-form long-context question answering benchmark in which a model must retrieve and reason over information distributed across a large corpus to produce open-ended answers that are scored by an LLM judge.

CorpusQA is a text benchmark evaluating models on long context, reasoning, and general tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–1 scale. The current average is 0.8, with the leader at 0.8.

Compare leaders on the best AI for long context, best AI for reasoning and best AI for general leaderboards.

Current leaders

MAI-Thinking-1 from Microsoft currently leads the CorpusQA leaderboard with a score of 0.820 across 1 evaluated AI models.

MAI-Thinking-1Microsoft82.0%

FAQ

Common questions about the CorpusQA benchmark and leaderboard.

What is the CorpusQA benchmark?

What is the CorpusQA leaderboard?

The CorpusQA leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, MAI-Thinking-1 by Microsoft leads with a score of 0.820. The average score across all models is 0.820.

What is the highest CorpusQA score?

The highest CorpusQA score is 0.820, achieved by MAI-Thinking-1 from Microsoft.

How many models are evaluated on CorpusQA?

1 models have been evaluated on the CorpusQA benchmark, with 0 verified results and 1 self-reported results.

What categories does CorpusQA cover?

CorpusQA is categorized under long context, reasoning, and general. The benchmark evaluates text models.

How recent are the CorpusQA leaderboard results?

The CorpusQA leaderboard was last updated in July 2026 and currently includes 1 evaluated models.