IFBench
Progress Over Time
Interactive timeline showing model performance evolution on IFBench
State-of-the-art frontier
Open
Proprietary
IFBench Leaderboard
27 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | 550B | — | — | |||
| 2 | Nous Research | 70B | — | — | ||
| 3 | Amazon | — | — | — | ||
| 4 | Alibaba Cloud / Qwen Team | — | 1.0M | $1.25 / $3.75 | ||
| 4 | Alibaba Cloud / Qwen Team | — | 1.0M | $0.32 / $1.28 | ||
| 6 | Alibaba Cloud / Qwen Team | 27B | 262K | $0.30 / $2.40 | ||
| 6 | Alibaba Cloud / Qwen Team | 397B | — | — | ||
| 8 | Alibaba Cloud / Qwen Team | 122B | — | — | ||
| 9 | Microsoft | — | — | — | ||
| 10 | Alibaba Cloud / Qwen Team | — | 1.0M | $0.50 / $3.00 | ||
| 11 | Cohere | 218B | — | — | ||
| 12 | 120B | — | — | |||
| 13 | Inception | — | 128K | $0.25 / $0.75 | ||
| 14 | Amazon | — | 1.0M | $0.30 / $2.50 | ||
| 15 | Alibaba Cloud / Qwen Team | 35B | — | — | ||
| 16 | MiniMax | 230B | 1.0M | $0.30 / $1.20 | ||
| 17 | OpenAI | 117B | 131K | $0.10 / $0.50 | ||
| 18 | Microsoft | 1.0T | — | — | ||
| 18 | Mistral AI | 128B | 256K | $1.50 / $7.50 | ||
| 20 | Amazon | — | — | — | ||
| 21 | LG AI Research | 236B | — | — | ||
| 22 | Alibaba Cloud / Qwen Team | 9B | — | — | ||
| 23 | Alibaba Cloud / Qwen Team | 4B | — | — | ||
| 24 | Mistral AI | 119B | 256K | $0.15 / $0.60 | ||
| 25 | Alibaba Cloud / Qwen Team | 2B | — | — | ||
| 26 | Amazon | — | 1.0M | $0.33 / $2.75 | ||
| 27 | Alibaba Cloud / Qwen Team | 800M | — | — |
Notice missing or incorrect data?
What is IFBench?
Instruction Following Benchmark evaluating model's ability to follow complex instructions
IFBench is a text benchmark evaluating models on instruction following and general tasks. LLM Stats tracks 27 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.8.
Compare leaders on the best AI for instruction following and best AI for general leaderboards.
Current leaders
Nemotron 3 Ultra (550B A55B) from NVIDIA currently leads the IFBench leaderboard with a score of 0.817 across 27 evaluated AI models.
FAQ
Common questions about the IFBench benchmark and leaderboard.