CFEval
CFEval benchmark for evaluating code generation and problem-solving capabilities
Progress Over Time
Interactive timeline showing model performance evolution on CFEval
State-of-the-art frontier
Open
Proprietary
CFEval Leaderboard
2 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Alibaba Cloud / Qwen Team | 235B | 262K | $0.30 / $3.00 | ||
| 2 | Alibaba Cloud / Qwen Team | 80B | 66K | $0.15 / $1.50 |
Notice missing or incorrect data?
FAQ
Common questions about CFEval
CFEval benchmark for evaluating code generation and problem-solving capabilities
The CFEval leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Qwen3-235B-A22B-Thinking-2507 by Alibaba Cloud / Qwen Team leads with a score of 2134.000. The average score across all models is 2102.500.
The highest CFEval score is 2134.000, achieved by Qwen3-235B-A22B-Thinking-2507 from Alibaba Cloud / Qwen Team.
2 models have been evaluated on the CFEval benchmark, with 0 verified results and 2 self-reported results.
CFEval is categorized under code. The benchmark evaluates text models.