LiveCodeBench v6
LiveCodeBench is a holistic and contamination-free evaluation benchmark for large language models for code. It continuously collects new problems from programming contests (LeetCode, AtCoder, CodeForces) and evaluates four different scenarios: code generation, self-repair, code execution, and test output prediction. Problems are annotated with release dates to enable evaluation on unseen problems released after a model's training cutoff.
Progress Over Time
Interactive timeline showing model performance evolution on LiveCodeBench v6
State-of-the-art frontier
Open
Proprietary
LiveCodeBench v6 Leaderboard
7 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | ByteDance | — | — | — | ||
2 | ByteDance | — | — | — | ||
3 | LG AI Research | 236B | — | — | ||
4 | Gemma 4 31BNew Google | 31B | — | — | ||
5 | Google | 25B | — | — | ||
6 | Gemma 4 E4BNew Google | 8B | — | — | ||
7 | Gemma 4 E2BNew Google | 5B | — | — |
Notice missing or incorrect data?
FAQ
Common questions about LiveCodeBench v6
LiveCodeBench is a holistic and contamination-free evaluation benchmark for large language models for code. It continuously collects new problems from programming contests (LeetCode, AtCoder, CodeForces) and evaluates four different scenarios: code generation, self-repair, code execution, and test output prediction. Problems are annotated with release dates to enable evaluation on unseen problems released after a model's training cutoff.
The LiveCodeBench v6 paper is available at https://arxiv.org/abs/2403.07974. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The LiveCodeBench v6 leaderboard ranks 7 AI models based on their performance on this benchmark. Currently, Seed 2.0 Pro by ByteDance leads with a score of 0.878. The average score across all models is 0.719.
The highest LiveCodeBench v6 score is 0.878, achieved by Seed 2.0 Pro from ByteDance.
7 models have been evaluated on the LiveCodeBench v6 benchmark, with 0 verified results and 7 self-reported results.
LiveCodeBench v6 is categorized under general and reasoning. The benchmark evaluates text models.