MicrosoftReleased on Feb 1, 2025

Phi-4-multimodal-instruct: Benchmarks, Pricing & Context Window

Phi-4-multimodal-instruct is a language model from Microsoft, released in February 2025, with multimodal input.

Phi-4-multimodal-instruct is a lightweight (5.57B parameters) open multimodal foundation model that leverages research and datasets from Phi-3.5 and 4.0. It processes text, image, and audio inputs to generate text outputs, supporting a

Input
TextImage
Output
Text

Phi-4-multimodal-instruct pricing

Providers

Phi-4-multimodal-instruct starts at $0.0500 per million input tokens and $0.100 per million output tokens via DeepInfra.

ProviderInput $/MOutput $/MMax InputMax OutputLatency sThroughputQuantInputOutput
DeepInfra logoDeepInfra
$0.0500$0.100128.0K128.0K
0.50
25 c/s

Phi-4-multimodal-instruct API

API access coming soon

Phi-4-multimodal-instruct will be available through our gateway shortly.

Phi-4-multimodal-instruct examples

Recent arena outputs from Phi-4-multimodal-instruct, picked from the highest-ranked matchups.

Phi-4-multimodal-instruct license

Phi-4-multimodal-instruct is released under the MIT license, which permits commercial use, has 5.6B parameters, has a knowledge cutoff of June 2024.

License
MIT
Commercial use allowed
Parameters
5.6B
Knowledge cutoff
June 2024

MIT License - allows commercial use

FAQ

Common questions about Phi-4-multimodal-instruct.

What is the Phi-4-multimodal-instruct release date?

Phi-4-multimodal-instruct was released on February 1, 2025 by Microsoft.

Who created Phi-4-multimodal-instruct?

Phi-4-multimodal-instruct was created by Microsoft.

How many parameters does Phi-4-multimodal-instruct have?

Phi-4-multimodal-instruct has 5.6 billion parameters.

What is the license for Phi-4-multimodal-instruct?

Phi-4-multimodal-instruct is released under the MIT license. This is an open-source/open-weight license.

What is the knowledge cutoff date for Phi-4-multimodal-instruct?

Phi-4-multimodal-instruct has a knowledge cutoff of June 2024. This means the model was trained on data up to this date and may not have information about events after this time.

Is Phi-4-multimodal-instruct multimodal?

Yes, Phi-4-multimodal-instruct is a multimodal model that can process both text and images as input.