IFBench

Progress Over Time

Interactive timeline showing model performance evolution on IFBench

State-of-the-art frontier
Open
Proprietary

IFBench Leaderboard

27 models
ContextCostLicense
1550B
2
Nous Research
Nous Research
70B
3
4
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$1.25 / $3.75
4
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$0.32 / $1.28
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B262K$0.30 / $2.40
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
397B
8
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B
9
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$0.50 / $3.00
11218B
12120B
13
Inception
Inception
128K$0.25 / $0.75
141.0M$0.30 / $2.50
15
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B
16230B1.0M$0.30 / $1.20
17117B131K$0.10 / $0.50
181.0T
18128B256K$1.50 / $7.50
20
21
LG AI Research
LG AI Research
236B
22
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B
23
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B
24
Mistral AI
Mistral AI
119B256K$0.15 / $0.60
25
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
2B
261.0M$0.33 / $2.75
27
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
800M
Notice missing or incorrect data?
About this benchmark

What is IFBench?

Instruction Following Benchmark evaluating model's ability to follow complex instructions

IFBench is a text benchmark evaluating models on instruction following and general tasks. LLM Stats tracks 27 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.8.

Compare leaders on the best AI for instruction following and best AI for general leaderboards.

Current leaders

Nemotron 3 Ultra (550B A55B) from NVIDIA currently leads the IFBench leaderboard with a score of 0.817 across 27 evaluated AI models.

2Hermes 3 70BNous Research81.2%
3Nova 2 ProAmazon80.2%

FAQ

Common questions about the IFBench benchmark and leaderboard.

What is the IFBench benchmark?

Instruction Following Benchmark evaluating model's ability to follow complex instructions

What is the IFBench leaderboard?

The IFBench leaderboard ranks 27 AI models based on their performance on this benchmark. Currently, Nemotron 3 Ultra (550B A55B) by NVIDIA leads with a score of 0.817. The average score across all models is 0.675.

What is the highest IFBench score?

The highest IFBench score is 0.817, achieved by Nemotron 3 Ultra (550B A55B) from NVIDIA.

How many models are evaluated on IFBench?

27 models have been evaluated on the IFBench benchmark, with 0 verified results and 27 self-reported results.

What categories does IFBench cover?

IFBench is categorized under instruction following and general. The benchmark evaluates text models.

What is the best open-source model on IFBench?

Nemotron 3 Ultra (550B A55B) by NVIDIA is the top-ranked open-source model on IFBench, with a score of 0.817 (rank #1).

Which model offers the best value on IFBench?

Among models scoring within 10% of the leader, Qwen3.5-27B from Alibaba Cloud / Qwen Team is the cheapest, at $0.30 per million input tokens with a score of 0.765.

How recent are the IFBench leaderboard results?

The IFBench leaderboard was last updated in July 2026 and currently includes 27 evaluated models.