NL2Repo
NL2Repo evaluates long-horizon coding capabilities including repository-level understanding, where models must generate or modify code across entire repositories from natural language specifications.
Progress Over Time
Interactive timeline showing model performance evolution on NL2Repo
State-of-the-art frontier
Open
Proprietary
NL2Repo Leaderboard
4 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Zhipu AI | 754B | 200K | $1.40 / $4.40 | ||
| 2 | MiniMax | — | 205K | $0.30 / $1.20 | ||
| 3 | Alibaba Cloud / Qwen Team | — | — | — | ||
| 4 | Alibaba Cloud / Qwen Team | 35B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about NL2Repo
NL2Repo evaluates long-horizon coding capabilities including repository-level understanding, where models must generate or modify code across entire repositories from natural language specifications.
The NL2Repo leaderboard ranks 4 AI models based on their performance on this benchmark. Currently, GLM-5.1 by Zhipu AI leads with a score of 0.427. The average score across all models is 0.374.
The highest NL2Repo score is 0.427, achieved by GLM-5.1 from Zhipu AI.
4 models have been evaluated on the NL2Repo benchmark, with 0 verified results and 4 self-reported results.
NL2Repo is categorized under agents and coding. The benchmark evaluates text models.