Phi-3.5-vision-instruct
Overview
Phi-3.5-vision-instruct is a 4.2B-parameter open multimodal model with up to 128K context tokens. It emphasizes multi-frame image understanding and reasoning, boosting performance on single-image benchmarks while enabling multi-image comparison, summarization, and even video analysis. The model underwent safety post-training for improved instruction-following, alignment, and robust handling of visual and text inputs, and is released under the MIT license.
Phi-3.5-vision-instruct was released on August 23, 2024.
Performance
Timeline
Specifications
Benchmarks
Phi-3.5-vision-instruct Performance Across Datasets
Scores sourced from the model's scorecard, paper, or official blog posts
Pricing
Pricing, performance, and capabilities for Phi-3.5-vision-instruct across different providers:
API Access
API Access Coming Soon
API access for Phi-3.5-vision-instruct will be available soon through our gateway.
Recent Posts
Recent Reviews
FAQ
Common questions about Phi-3.5-vision-instruct