MRCR
MRCR (Multi-Round Coreference Resolution) is a synthetic long-context reasoning task where models must navigate long conversations to reproduce specific model outputs. It tests the ability to distinguish between similar requests and reason about ordering while maintaining attention across extended contexts.
Progress Over Time
Interactive timeline showing model performance evolution on MRCR
MRCR Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Google | — | 1.0M | $1.25 / $10.00 | ||
| 2 | Google | — | 2.1M | $2.50 / $10.00 | ||
| 3 | Google | — | 1.0M | $0.15 / $0.60 | ||
| 4 | Google | — | 1.0M | $0.10 / $0.40 | ||
| 5 | Google | 8B | 1.0M | $0.07 / $0.30 | ||
| 6 | Xiaomi | 309B | 256K | $0.10 / $0.30 | ||
| 7 | Google | — | 1.0M | $0.30 / $2.50 |
FAQ
Common questions about MRCR
Sub-benchmarks
MRCR 128K (2-needle)
MRCR (Multi-Round Coreference Resolution) at 128K context length with 2 needles. Models must navigate long conversations to reproduce specific model outputs, testing attention and reasoning across 128K-token contexts with 2 items to retrieve.
MRCR 128K (4-needle)
MRCR (Multi-Round Coreference Resolution) at 128K context length with 4 needles. Models must navigate long conversations to reproduce specific model outputs, testing attention and reasoning across 128K-token contexts with 4 items to retrieve.
MRCR 128K (8-needle)
MRCR (Multi-Round Coreference Resolution) at 128K context length with 8 needles. Models must navigate long conversations to reproduce specific model outputs, testing attention and reasoning across 128K-token contexts with 8 items to retrieve.
MRCR 64K (2-needle)
MRCR (Multi-Round Coreference Resolution) at 64K context length with 2 needles. Models must navigate long conversations to reproduce specific model outputs, testing attention and reasoning across 64K-token contexts with 2 items to retrieve.
MRCR 64K (4-needle)
MRCR (Multi-Round Coreference Resolution) at 64K context length with 4 needles. Models must navigate long conversations to reproduce specific model outputs, testing attention and reasoning across 64K-token contexts with 4 items to retrieve.
MRCR 64K (8-needle)
MRCR (Multi-Round Coreference Resolution) at 64K context length with 8 needles. Models must navigate long conversations to reproduce specific model outputs, testing attention and reasoning across 64K-token contexts with 8 items to retrieve.