SecCodeBench
SecCodeBench evaluates LLM coding agents on secure code generation and vulnerability detection, testing the ability to produce code that is both functional and free from security vulnerabilities.
Qwen3.5-397B-A17B from Alibaba Cloud / Qwen Team currently leads the SecCodeBench leaderboard with a score of 0.683 across 1 evaluated AI models.
Qwen3.5-397B-A17B leads with 68.3%.
Progress Over Time
Interactive timeline showing model performance evolution on SecCodeBench
SecCodeBench Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Alibaba Cloud / Qwen Team | 397B | 262K | $0.60 / $3.60 |
FAQ
Common questions about SecCodeBench.
More evaluations to explore
Related benchmarks in the same category
Claw-Eval tests real-world agentic task completion across complex multi-step scenarios, evaluating a model's ability to use tools, navigate environments, and complete end-to-end tasks autonomously.
NL2Repo evaluates long-horizon coding capabilities including repository-level understanding, where models must generate or modify code across entire repositories from natural language specifications.
PinchBench evaluates coding agents on real-world agentic coding tasks, measuring both best-case and average performance across complex software engineering scenarios.
SkillsBench evaluates coding agents on self-contained programming tasks, measuring practical engineering skills across diverse software development scenarios.
ZClawBench evaluates Claw-style agent task execution quality, measuring a model's ability to autonomously complete complex multi-step coding tasks in real-world environments.
CC-Bench-V2 Backend evaluates coding agents on backend development tasks, measuring practical engineering ability to implement server-side logic, APIs, and system components.