CFEval
Progress Over Time
Interactive timeline showing model performance evolution on CFEval
State-of-the-art frontier
Open
Proprietary
CFEval Leaderboard
2 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Alibaba Cloud / Qwen Team | 235B | — | — | ||
| 2 | Alibaba Cloud / Qwen Team | 80B | — | — |
Notice missing or incorrect data?
What is CFEval?
CFEval benchmark for evaluating code generation and problem-solving capabilities
CFEval is a text benchmark evaluating models on code tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–10000 scale. The current average is 2102.5, with the leader at 2134.0.
Compare leaders on the best AI for code leaderboards.
Current leaders
Qwen3-235B-A22B-Thinking-2507 from Alibaba Cloud / Qwen Team currently leads the CFEval leaderboard with a score of 2134.000 across 2 evaluated AI models.
FAQ
Common questions about the CFEval benchmark and leaderboard.