Benchmarks/code/LiveCodeBench Pro

LiveCodeBench Pro

LiveCodeBench Pro is an advanced evaluation benchmark for large language models for code that uses Elo ratings to rank models based on their performance on coding tasks. It evaluates models on real-world coding problems from programming contests (LeetCode, AtCoder, CodeForces) and provides a relative ranking system where higher Elo scores indicate superior performance.

Paper

Progress Over Time

Interactive timeline showing model performance evolution on LiveCodeBench Pro

State-of-the-art frontier
Open
Proprietary

LiveCodeBench Pro Leaderboard

3 models • 0 verified
ContextCostLicense
1
1.0M
$2.50
$15.00
2
3
1.0M
$0.50
$3.00
Notice missing or incorrect data?

FAQ

Common questions about LiveCodeBench Pro

LiveCodeBench Pro is an advanced evaluation benchmark for large language models for code that uses Elo ratings to rank models based on their performance on coding tasks. It evaluates models on real-world coding problems from programming contests (LeetCode, AtCoder, CodeForces) and provides a relative ranking system where higher Elo scores indicate superior performance.
The LiveCodeBench Pro paper is available at https://arxiv.org/abs/2403.07974. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The LiveCodeBench Pro leaderboard ranks 3 AI models based on their performance on this benchmark. Currently, Gemini 3.1 Pro by Google leads with a score of 2887.000. The average score across all models is 2547.333.
The highest LiveCodeBench Pro score is 2887.000, achieved by Gemini 3.1 Pro from Google.
3 models have been evaluated on the LiveCodeBench Pro benchmark, with 0 verified results and 3 self-reported results.
LiveCodeBench Pro is categorized under code, general, and reasoning. The benchmark evaluates text models.