MM IF-Eval
A challenging multimodal instruction-following benchmark that includes both compose-level constraints for output responses and perception-level constraints tied to input images, with comprehensive evaluation pipeline.
Progress Over Time
Interactive timeline showing model performance evolution on MM IF-Eval
State-of-the-art frontier
Open
Proprietary
MM IF-Eval Leaderboard
1 models
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | Mistral AI | 12B | 128K | $0.15 / $0.15 |
Notice missing or incorrect data?
FAQ
Common questions about MM IF-Eval
A challenging multimodal instruction-following benchmark that includes both compose-level constraints for output responses and perception-level constraints tied to input images, with comprehensive evaluation pipeline.
The MM IF-Eval paper is available at https://arxiv.org/abs/2504.07957. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The MM IF-Eval leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, Pixtral-12B by Mistral AI leads with a score of 0.527. The average score across all models is 0.527.
The highest MM IF-Eval score is 0.527, achieved by Pixtral-12B from Mistral AI.
1 models have been evaluated on the MM IF-Eval benchmark, with 0 verified results and 1 self-reported results.
MM IF-Eval is categorized under multimodal, reasoning, and structured output. The benchmark evaluates multimodal models.