MRCR 64K (4-needle) Leaderboard

Progress Over Time

Interactive timeline showing model performance evolution on MRCR 64K (4-needle)

State-of-the-art frontier

Open

Proprietary

MRCR 64K (4-needle) Leaderboard

1 models

				Context	Cost	License
1	MiniCPM-SALA OpenBMB		9B	—	—

FAQ

Common questions about MRCR 64K (4-needle)

MRCR (Multi-Round Coreference Resolution) at 64K context length with 4 needles. Models must navigate long conversations to reproduce specific model outputs, testing attention and reasoning across 64K-token contexts with 4 items to retrieve.

The MRCR 64K (4-needle) paper is available at https://arxiv.org/abs/2409.12640. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.

The MRCR 64K (4-needle) leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, MiniCPM-SALA by OpenBMB leads with a score of 0.206. The average score across all models is 0.206.

The highest MRCR 64K (4-needle) score is 0.206, achieved by MiniCPM-SALA from OpenBMB.

1 models have been evaluated on the MRCR 64K (4-needle) benchmark, with 0 verified results and 1 self-reported results.

MRCR 64K (4-needle) is categorized under general, long context, and reasoning. The benchmark evaluates text models.

MRCR 64K (4-needle)

Progress Over Time

MRCR 64K (4-needle) Leaderboard

FAQ

What is the MRCR 64K (4-needle) benchmark?

Where can I find the MRCR 64K (4-needle) paper?

What is the MRCR 64K (4-needle) leaderboard?

What is the highest MRCR 64K (4-needle) score?

How many models are evaluated on MRCR 64K (4-needle)?

What categories does MRCR 64K (4-needle) cover?