Phi-3.5-vision-instruct
MicrosoftOverview
Phi-3.5-vision-instruct is a 4.2B-parameter open multimodal model with up to 128K context tokens. It emphasizes multi-frame image understanding and reasoning, boosting performance on single-image benchmarks while enabling multi-image comparison, summarization, and even video analysis. The model underwent safety post-training for improved instruction-following, alignment, and robust handling of visual and text inputs, and is released under the MIT license.
Phi-3.5-vision-instruct was released on August 23, 2024.
Performance
Timeline
Other Details
Related Models
Compare Phi-3.5-vision-instruct to other models by quality (GPQA score) vs cost. Higher scores and lower costs represent better value.
Performance visualization loading...
Gathering benchmark data from similar models
Benchmarks
Phi-3.5-vision-instruct Performance Across Datasets
Scores sourced from the model's scorecard, paper, or official blog posts
Pricing
Pricing, performance, and capabilities for Phi-3.5-vision-instruct across different providers:
Example Outputs
Recent Posts
Recent Reviews
API Access
API Access Coming Soon
API access for Phi-3.5-vision-instruct will be available soon through our gateway.
FAQ
Common questions about Phi-3.5-vision-instruct
