MRCR 128K (8-needle)
MRCR (Multi-Round Coreference Resolution) at 128K context length with 8 needles. Models must navigate long conversations to reproduce specific model outputs, testing attention and reasoning across 128K-token contexts with 8 items to retrieve.
Progress Over Time
Interactive timeline showing model performance evolution on MRCR 128K (8-needle)
State-of-the-art frontier
Open
Proprietary
MRCR 128K (8-needle) Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | OpenBMB | 9B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about MRCR 128K (8-needle)
MRCR (Multi-Round Coreference Resolution) at 128K context length with 8 needles. Models must navigate long conversations to reproduce specific model outputs, testing attention and reasoning across 128K-token contexts with 8 items to retrieve.
The MRCR 128K (8-needle) paper is available at https://arxiv.org/abs/2409.12640. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MRCR 128K (8-needle) leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, MiniCPM-SALA by OpenBMB leads with a score of 0.101. The average score across all models is 0.101.
The highest MRCR 128K (8-needle) score is 0.101, achieved by MiniCPM-SALA from OpenBMB.
1 models have been evaluated on the MRCR 128K (8-needle) benchmark, with 0 verified results and 1 self-reported results.
MRCR 128K (8-needle) is categorized under general, long context, and reasoning. The benchmark evaluates text models.