MRCR 64K (4-needle)
MRCR (Multi-Round Coreference Resolution) at 64K context length with 4 needles. Models must navigate long conversations to reproduce specific model outputs, testing attention and reasoning across 64K-token contexts with 4 items to retrieve.
Progress Over Time
Interactive timeline showing model performance evolution on MRCR 64K (4-needle)
State-of-the-art frontier
Open
Proprietary
MRCR 64K (4-needle) Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | OpenBMB | 9B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about MRCR 64K (4-needle)
MRCR (Multi-Round Coreference Resolution) at 64K context length with 4 needles. Models must navigate long conversations to reproduce specific model outputs, testing attention and reasoning across 64K-token contexts with 4 items to retrieve.
The MRCR 64K (4-needle) paper is available at https://arxiv.org/abs/2409.12640. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MRCR 64K (4-needle) leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, MiniCPM-SALA by OpenBMB leads with a score of 0.206. The average score across all models is 0.206.
The highest MRCR 64K (4-needle) score is 0.206, achieved by MiniCPM-SALA from OpenBMB.
1 models have been evaluated on the MRCR 64K (4-needle) benchmark, with 0 verified results and 1 self-reported results.
MRCR 64K (4-needle) is categorized under general, long context, and reasoning. The benchmark evaluates text models.