CodeForces
A competitive programming benchmark using problems from the CodeForces platform. The benchmark evaluates code generation capabilities of LLMs on algorithmic problems with difficulty ratings ranging from 800 to 2400. Problems cover diverse algorithmic categories including dynamic programming, graph algorithms, data structures, and mathematical problems with standardized evaluation through direct platform submission.
Progress Over Time
Interactive timeline showing model performance evolution on CodeForces
State-of-the-art frontier
Open
Proprietary
CodeForces Leaderboard
11 models • 0 verified
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
1 | DeepSeek | 685B | — | — | ||
2 | Alibaba Cloud / Qwen Team | 122B | — | — | ||
3 | Alibaba Cloud / Qwen Team | 35B | — | — | ||
4 | OpenAI | 117B | 131K | $0.09 $0.45 | ||
5 | Alibaba Cloud / Qwen Team | 27B | — | — | ||
6 | DeepSeek | 685B | — | — | ||
7 | OpenAI | 21B | — | — | ||
8 | DeepSeek | 685B | — | — | ||
9 | DeepSeek | 671B | 164K | $0.27 $1.00 | ||
10 | Alibaba Cloud / Qwen Team | 33B | 128K | $0.10 $0.30 | ||
11 | DeepSeek | 671B | 131K | $0.50 $2.15 |
Notice missing or incorrect data?
FAQ
Common questions about CodeForces
A competitive programming benchmark using problems from the CodeForces platform. The benchmark evaluates code generation capabilities of LLMs on algorithmic problems with difficulty ratings ranging from 800 to 2400. Problems cover diverse algorithmic categories including dynamic programming, graph algorithms, data structures, and mathematical problems with standardized evaluation through direct platform submission.
The CodeForces paper is available at https://arxiv.org/abs/2501.01257. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The CodeForces leaderboard ranks 11 AI models based on their performance on this benchmark. Currently, DeepSeek-V3.2-Speciale by DeepSeek leads with a score of 0.900. The average score across all models is 0.768.
The highest CodeForces score is 0.900, achieved by DeepSeek-V3.2-Speciale from DeepSeek.
11 models have been evaluated on the CodeForces benchmark, with 0 verified results and 11 self-reported results.
CodeForces is categorized under math and reasoning. The benchmark evaluates text models.