MRCR v2 Leaderboard

Progress Over Time

Interactive timeline showing model performance evolution on MRCR v2

State-of-the-art frontier

Open

Proprietary

MRCR v2 Leaderboard

5 models

			Context	Cost
1	Gemma 4 31B Google	31B	262K	$0.14 / $0.40
2	Gemma 4 26B-A4B Google	25B	262K	$0.13 / $0.40
3	Gemma 4 E4B Google	8B	—	—
4	Gemma 4 E2B Google	5B	—	—
5	Gemini 2.5 Flash-Lite Google	—	1.0M	$0.10 / $0.40

FAQ

Common questions about MRCR v2

MRCR v2 (Multi-Round Coreference Resolution version 2) is an enhanced version of the synthetic long-context reasoning task. It extends the original MRCR framework with improved evaluation criteria and additional complexity for testing models' ability to maintain attention and reasoning across extended contexts.

The MRCR v2 paper is available at https://arxiv.org/abs/2409.12640. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.

The MRCR v2 leaderboard ranks 5 AI models based on their performance on this benchmark. Currently, Gemma 4 31B by Google leads with a score of 0.664. The average score across all models is 0.343.

The highest MRCR v2 score is 0.664, achieved by Gemma 4 31B from Google.

5 models have been evaluated on the MRCR v2 benchmark, with 0 verified results and 5 self-reported results.

MRCR v2 is categorized under general, long context, and reasoning. The benchmark evaluates text models.

MRCR v2

Progress Over Time

MRCR v2 Leaderboard

FAQ

What is the MRCR v2 benchmark?

Where can I find the MRCR v2 paper?

What is the MRCR v2 leaderboard?

What is the highest MRCR v2 score?

How many models are evaluated on MRCR v2?

What categories does MRCR v2 cover?