NL2Repo

Name: NL2Repo Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on NL2Repo

State-of-the-art frontier

Open

Proprietary

NL2Repo Leaderboard

12 models

			Context	Cost
1	GLM-5.2 Zhipu AI	753B	1.0M	$0.95 / $3.00
2	Qwen3.7 Max Alibaba Cloud / Qwen Team	—	1.0M	$1.25 / $3.75
3	Seed 2.1 Pro ByteDance	—	—	—
4	Hy3 Tencent	295B	—	—
5	Seed 2.1 Turbo ByteDance	—	—	—
6	GLM-5.1 Zhipu AI	754B	200K	$1.40 / $4.40
7	MiniMax M3 MiniMax	—	1.0M	$0.30 / $1.20
8	Qwen3.7-Plus Alibaba Cloud / Qwen Team	—	1.0M	$0.32 / $1.28
9	MiniMax M2.7 MiniMax	—	205K	$0.30 / $1.20
10	Qwen3.6 Plus Alibaba Cloud / Qwen Team	—	1.0M	$0.50 / $3.00
11	Qwen3.6-27B Alibaba Cloud / Qwen Team	28B	262K	$0.60 / $3.60
12	Qwen3.6-35B-A3B Alibaba Cloud / Qwen Team	35B	—	—

Notice missing or incorrect data?

About this benchmark

What is NL2Repo?

NL2Repo evaluates long-horizon coding capabilities including repository-level understanding, where models must generate or modify code across entire repositories from natural language specifications.

NL2Repo is a text benchmark evaluating models on agents and code tasks. LLM Stats tracks 12 models on this benchmark, scored on a 0–1 scale. The current average is 0.4, with the leader at 0.5.

Compare leaders on the best AI for agents and best AI for code leaderboards.

Current leaders

GLM-5.2 from Zhipu AI currently leads the NL2Repo leaderboard with a score of 0.489 across 12 evaluated AI models.

GLM-5.2Zhipu AI48.9%

Qwen3.7 MaxAlibaba Cloud / Qwen Team47.2%

Seed 2.1 ProByteDance47.0%

FAQ

Common questions about the NL2Repo benchmark and leaderboard.

What is the NL2Repo benchmark?

NL2Repo evaluates long-horizon coding capabilities including repository-level understanding, where models must generate or modify code across entire repositories from natural language specifications.

What is the NL2Repo leaderboard?

The NL2Repo leaderboard ranks 12 AI models based on their performance on this benchmark. Currently, GLM-5.2 by Zhipu AI leads with a score of 0.489. The average score across all models is 0.418.

What is the highest NL2Repo score?

The highest NL2Repo score is 0.489, achieved by GLM-5.2 from Zhipu AI.

How many models are evaluated on NL2Repo?

12 models have been evaluated on the NL2Repo benchmark, with 0 verified results and 12 self-reported results.

What categories does NL2Repo cover?

NL2Repo is categorized under agents and code. The benchmark evaluates text models.

What is the best open-source model on NL2Repo?

GLM-5.2 by Zhipu AI is the top-ranked open-source model on NL2Repo, with a score of 0.489 (rank #1).

Which model offers the best value on NL2Repo?

Among models scoring within 10% of the leader, GLM-5.2 from Zhipu AI is the cheapest, at $0.95 per million input tokens with a score of 0.489.

How recent are the NL2Repo leaderboard results?

The NL2Repo leaderboard was last updated in July 2026 and currently includes 12 evaluated models.