Microsoft logo

Phi-3.5-vision-instruct

Microsoft
phi-3.5-vision-instructVariant

Overview

Phi-3.5-vision-instruct is a 4.2B-parameter open multimodal model with up to 128K context tokens. It emphasizes multi-frame image understanding and reasoning, boosting performance on single-image benchmarks while enabling multi-image comparison, summarization, and even video analysis. The model underwent safety post-training for improved instruction-following, alignment, and robust handling of visual and text inputs, and is released under the MIT license.

Phi-3.5-vision-instruct was released on August 23, 2024.

Performance

Timeline

Release DateUnknown
Knowledge CutoffUnknown

Other Details

Parameters
4.2B
License
MIT
Training Data
Unknown
Tags
tuning:instruct

Related Models

Compare Phi-3.5-vision-instruct to other models by quality (GPQA score) vs cost. Higher scores and lower costs represent better value.

Performance visualization loading...

Gathering benchmark data from similar models

Benchmarks

Phi-3.5-vision-instruct Performance Across Datasets

Scores sourced from the model's scorecard, paper, or official blog posts

LLM Stats Logollm-stats.com - Sat Dec 06 2025
Notice missing or incorrect data?Start an Issue discussion

Pricing

Pricing, performance, and capabilities for Phi-3.5-vision-instruct across different providers:

No pricing information available for this model.

Example Outputs

Recent Posts

Recent Reviews

API Access

API Access Coming Soon

API access for Phi-3.5-vision-instruct will be available soon through our gateway.

FAQ

Common questions about Phi-3.5-vision-instruct

Phi-3.5-vision-instruct was released on August 23, 2024.
Phi-3.5-vision-instruct has 4.2 billion parameters.