CC-Bench-V2 Backend

Name: CC-Bench-V2 Backend Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

CC-Bench-V2 Backend evaluates coding agents on backend development tasks, measuring practical engineering ability to implement server-side logic, APIs, and system components.

GLM-5V-Turbo from Zhipu AI currently leads the CC-Bench-V2 Backend leaderboard with a score of 0.228 across 1 evaluated AI models.

About this benchmark

What CC-Bench-V2 Backend measures

CC-Bench-V2 Backend is a text benchmark that evaluates large language models on coding tasks. LLM Stats tracks 1 model on this benchmark, with a maximum possible score of 1. Current average across reported models is 0.2, with the leader reaching 0.2.

Compare leaders on the best AI for coding leaderboards.

GLM-5V-Turbo leads with 22.8%.

Progress Over Time

Interactive timeline showing model performance evolution on CC-Bench-V2 Backend

State-of-the-art frontier

Open

Proprietary

CC-Bench-V2 Backend Leaderboard

1 models

				Context	Cost	License
1	GLM-5V-Turbo Zhipu AI		—	—	—

Notice missing or incorrect data?

FAQ

Common questions about CC-Bench-V2 Backend.

What is the CC-Bench-V2 Backend benchmark?

CC-Bench-V2 Backend evaluates coding agents on backend development tasks, measuring practical engineering ability to implement server-side logic, APIs, and system components.

What is the CC-Bench-V2 Backend leaderboard?

The CC-Bench-V2 Backend leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, GLM-5V-Turbo by Zhipu AI leads with a score of 0.228. The average score across all models is 0.228.

What is the highest CC-Bench-V2 Backend score?

The highest CC-Bench-V2 Backend score is 0.228, achieved by GLM-5V-Turbo from Zhipu AI.

How many models are evaluated on CC-Bench-V2 Backend?

1 models have been evaluated on the CC-Bench-V2 Backend benchmark, with 0 verified results and 1 self-reported results.

What categories does CC-Bench-V2 Backend cover?

CC-Bench-V2 Backend is categorized under coding. The benchmark evaluates text models.

How recent are the CC-Bench-V2 Backend leaderboard results?

The CC-Bench-V2 Backend leaderboard was last updated in June 2026 and currently includes 1 evaluated models.

More evaluations to explore

Related benchmarks in the same category

View all coding →

Claw-Eval

Claw-Eval tests real-world agentic task completion across complex multi-step scenarios, evaluating a model's ability to use tools, navigate environments, and complete end-to-end tasks autonomously.

coding

11 models

NL2Repo

NL2Repo evaluates long-horizon coding capabilities including repository-level understanding, where models must generate or modify code across entire repositories from natural language specifications.

coding

7 models

SkillsBench

SkillsBench evaluates coding agents on self-contained programming tasks, measuring practical engineering skills across diverse software development scenarios.

coding

4 models

ZClawBench

ZClawBench evaluates Claw-style agent task execution quality, measuring a model's ability to autonomously complete complex multi-step coding tasks in real-world environments.

coding

4 models

PinchBench

PinchBench evaluates coding agents on real-world agentic coding tasks, measuring both best-case and average performance across complex software engineering scenarios.

coding

3 models

LongCodeBench

LongCodeBench evaluates the code understanding and comprehension abilities of large language models at very long context windows, scaling up to 1M tokens. It tests whether models can reason about extensive codebases provided in a single prompt by answering multiple-choice questions about the code.

coding

2 models