Benchmarks/general/MRCR 128K (8-needle)

MRCR 128K (8-needle)

MRCR (Multi-Round Coreference Resolution) at 128K context length with 8 needles. Models must navigate long conversations to reproduce specific model outputs, testing attention and reasoning across 128K-token contexts with 8 items to retrieve.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on MRCR 128K (8-needle)

State-of-the-art frontier
Open
Proprietary

MRCR 128K (8-needle) Leaderboard

1 models
ContextCostLicense
19B
Notice missing or incorrect data?

FAQ

Common questions about MRCR 128K (8-needle)

MRCR (Multi-Round Coreference Resolution) at 128K context length with 8 needles. Models must navigate long conversations to reproduce specific model outputs, testing attention and reasoning across 128K-token contexts with 8 items to retrieve.
The MRCR 128K (8-needle) paper is available at https://arxiv.org/abs/2409.12640. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MRCR 128K (8-needle) leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, MiniCPM-SALA by OpenBMB leads with a score of 0.101. The average score across all models is 0.101.
The highest MRCR 128K (8-needle) score is 0.101, achieved by MiniCPM-SALA from OpenBMB.
1 models have been evaluated on the MRCR 128K (8-needle) benchmark, with 0 verified results and 1 self-reported results.
MRCR 128K (8-needle) is categorized under general, long context, and reasoning. The benchmark evaluates text models.