CFEval

CFEval benchmark for evaluating code generation and problem-solving capabilities

Progress Over Time

Interactive timeline showing model performance evolution on CFEval

State-of-the-art frontier
Open
Proprietary

CFEval Leaderboard

2 models
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
235B262K$0.30 / $3.00
2
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
80B66K$0.15 / $1.50
Notice missing or incorrect data?

FAQ

Common questions about CFEval

CFEval benchmark for evaluating code generation and problem-solving capabilities
The CFEval leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Qwen3-235B-A22B-Thinking-2507 by Alibaba Cloud / Qwen Team leads with a score of 2134.000. The average score across all models is 2102.500.
The highest CFEval score is 2134.000, achieved by Qwen3-235B-A22B-Thinking-2507 from Alibaba Cloud / Qwen Team.
2 models have been evaluated on the CFEval benchmark, with 0 verified results and 2 self-reported results.
CFEval is categorized under code. The benchmark evaluates text models.