IFBench

Instruction Following Benchmark evaluating model's ability to follow complex instructions

Progress Over Time

Interactive timeline showing model performance evolution on IFBench

State-of-the-art frontier
Open
Proprietary

IFBench Leaderboard

16 models
ContextCostLicense
1
Nous Research
Nous Research
70B
2
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
397B262K$0.60 / $3.60
2
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B
4
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B262K$0.40 / $3.20
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
6120B262K$0.10 / $0.50
7
Inception
Inception
128K$0.25 / $0.75
8
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B262K$0.25 / $2.00
9230B1.0M$0.30 / $1.20
10117B131K$0.10 / $0.50
11
LG AI Research
LG AI Research
236B33K$0.60 / $1.00
12
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B
13
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B
14
Mistral AI
Mistral AI
119B256K$0.15 / $0.60
15
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
2B
16
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
800M
Notice missing or incorrect data?

FAQ

Common questions about IFBench

Instruction Following Benchmark evaluating model's ability to follow complex instructions
The IFBench leaderboard ranks 16 AI models based on their performance on this benchmark. Currently, Hermes 3 70B by Nous Research leads with a score of 0.812. The average score across all models is 0.649.
The highest IFBench score is 0.812, achieved by Hermes 3 70B from Nous Research.
16 models have been evaluated on the IFBench benchmark, with 0 verified results and 16 self-reported results.
IFBench is categorized under general and instruction following. The benchmark evaluates text models.