Microsoft logo

Phi-3.5-vision-instruct

Microsoft·Aug 2024·MIT

Phi-3.5-vision-instruct is a 4.2B-parameter open multimodal model with up to 128K context tokens. It emphasizes multi-frame image understanding and reasoning, boosting performance on single-image benchmarks while enabling multi-image

Parameters4.2B

Benchmarks

Arena Performance

Phi-3.5-vision-instruct Performance Across Datasets

Scores sourced from the model's scorecard, paper, or official blog posts

LLM Stats Logollm-stats.com - Fri Feb 27 2026
Notice missing or incorrect data?Start an Issue discussion

FAQ

Common questions about Phi-3.5-vision-instruct

Phi-3.5-vision-instruct was released on August 23, 2024 by Microsoft.
Phi-3.5-vision-instruct was created by Microsoft.
Phi-3.5-vision-instruct has 4.2 billion parameters.
Phi-3.5-vision-instruct is released under the MIT license. This is an open-source/open-weight license.
Yes, Phi-3.5-vision-instruct is a multimodal model that can process both text and images as input.