Phi-3.5-vision-instruct

Name: Phi-3.5-vision-instruct
Author: Microsoft

Microsoft·Aug 2024·MIT

Phi-3.5-vision-instruct is a 4.2B-parameter open multimodal model with up to 128K context tokens. It emphasizes multi-frame image understanding and reasoning, boosting performance on single-image benchmarks while enabling multi-image

Parameters4.2B

Benchmarks

Arena Performance

Phi-3.5-vision-instruct Performance Across Datasets

Scores sourced from the model's scorecard, paper, or official blog posts

llm-stats.com - Fri Feb 27 2026

Notice missing or incorrect data?Start an Issue discussion→

FAQ

Common questions about Phi-3.5-vision-instruct

Phi-3.5-vision-instruct was released on August 23, 2024 by Microsoft.

Phi-3.5-vision-instruct was created by Microsoft.

Phi-3.5-vision-instruct has 4.2 billion parameters.

Phi-3.5-vision-instruct is released under the MIT license. This is an open-source/open-weight license.

Yes, Phi-3.5-vision-instruct is a multimodal model that can process both text and images as input.

Phi-3.5-vision-instruct

Benchmarks

Arena Performance

Phi-3.5-vision-instruct Performance Across Datasets

FAQ

When was Phi-3.5-vision-instruct released?

Who created Phi-3.5-vision-instruct?

How many parameters does Phi-3.5-vision-instruct have?

What is the license for Phi-3.5-vision-instruct?

Is Phi-3.5-vision-instruct multimodal?