- Organizations
- Microsoft
- Phi-4-multimodal-instruct
Phi-4-multimodal-instruct: Benchmarks, Pricing & Context Window
Phi-4-multimodal-instruct is a language model from Microsoft, released in February 2025, with multimodal input.
Phi-4-multimodal-instruct is a lightweight (5.57B parameters) open multimodal foundation model that leverages research and datasets from Phi-3.5 and 4.0. It processes text, image, and audio inputs to generate text outputs, supporting a
Phi-4-multimodal-instruct pricing
Providers
Phi-4-multimodal-instruct starts at $0.0500 per million input tokens and $0.100 per million output tokens via DeepInfra.
| Provider | Input $/M | Output $/M | Max Input | Max Output | Latency s | Throughput | Quant | Input | Output |
|---|---|---|---|---|---|---|---|---|---|
| $0.0500 | $0.100 | 128.0K | 128.0K | 0.50 | 25 c/s | — |
Phi-4-multimodal-instruct API
API access coming soon
Phi-4-multimodal-instruct will be available through our gateway shortly.
Phi-4-multimodal-instruct examples
Recent arena outputs from Phi-4-multimodal-instruct, picked from the highest-ranked matchups.
Phi-4-multimodal-instruct license
Phi-4-multimodal-instruct is released under the MIT license, which permits commercial use, has 5.6B parameters, has a knowledge cutoff of June 2024.
- License
- MIT
- Commercial use allowed
- Parameters
- 5.6B
- Knowledge cutoff
- June 2024
MIT License - allows commercial use
FAQ
Common questions about Phi-4-multimodal-instruct.