MicrosoftReleased on Aug 23, 2024

Phi-3.5-vision-instruct: Benchmarks, Pricing & Context Window

Phi-3.5-vision-instruct is a language model from Microsoft, released in August 2024, with multimodal input.

Phi-3.5-vision-instruct is a 4.2B-parameter open multimodal model with up to 128K context tokens. It emphasizes multi-frame image understanding and reasoning, boosting performance on single-image benchmarks while enabling multi-image

Phi-3.5-vision-instruct API

API access coming soon

Phi-3.5-vision-instruct will be available through our gateway shortly.

Phi-3.5-vision-instruct examples

Recent arena outputs from Phi-3.5-vision-instruct, picked from the highest-ranked matchups.

Phi-3.5-vision-instruct license

Phi-3.5-vision-instruct is released under the MIT license, which permits commercial use, has 4.2B parameters.

License
MIT
Commercial use allowed
Parameters
4.2B

MIT License - allows commercial use

FAQ

Common questions about Phi-3.5-vision-instruct.

What is the Phi-3.5-vision-instruct release date?

Phi-3.5-vision-instruct was released on August 23, 2024 by Microsoft.

Who created Phi-3.5-vision-instruct?

Phi-3.5-vision-instruct was created by Microsoft.

How many parameters does Phi-3.5-vision-instruct have?

Phi-3.5-vision-instruct has 4.2 billion parameters.

What is the license for Phi-3.5-vision-instruct?

Phi-3.5-vision-instruct is released under the MIT license. This is an open-source/open-weight license.

Is Phi-3.5-vision-instruct multimodal?

Yes, Phi-3.5-vision-instruct is a multimodal model that can process both text and images as input.