Microsoft logo

Phi-3.5-vision-instruct

Overview

Phi-3.5-vision-instruct is a 4.2B-parameter open multimodal model with up to 128K context tokens. It emphasizes multi-frame image understanding and reasoning, boosting performance on single-image benchmarks while enabling multi-image comparison, summarization, and even video analysis. The model underwent safety post-training for improved instruction-following, alignment, and robust handling of visual and text inputs, and is released under the MIT license.

Phi-3.5-vision-instruct was released on August 23, 2024.

Performance

Timeline

ReleasedUnknown
Knowledge CutoffUnknown

Specifications

Parameters
4.2B
License
MIT
Training Data
Unknown
Tags
tuning:instruct

Benchmarks

Phi-3.5-vision-instruct Performance Across Datasets

Scores sourced from the model's scorecard, paper, or official blog posts

LLM Stats Logollm-stats.com - Wed Jan 14 2026
Notice missing or incorrect data?Start an Issue discussion

Pricing

Pricing, performance, and capabilities for Phi-3.5-vision-instruct across different providers:

No pricing information available for this model.

API Access

API Access Coming Soon

API access for Phi-3.5-vision-instruct will be available soon through our gateway.

Recent Posts

Recent Reviews

FAQ

Common questions about Phi-3.5-vision-instruct

Phi-3.5-vision-instruct was released on August 23, 2024.
Phi-3.5-vision-instruct has 4.2 billion parameters.