- Organizations
- Microsoft
- Phi-3.5-vision-instruct
Phi-3.5-vision-instruct
Phi-3.5-vision-instruct is a 4.2B-parameter open multimodal model with up to 128K context tokens. It emphasizes multi-frame image understanding and reasoning, boosting performance on single-image benchmarks while enabling multi-image
Benchmarks
Arena Performance
Phi-3.5-vision-instruct Performance Across Datasets
Scores sourced from the model's scorecard, paper, or official blog posts
Notice missing or incorrect data?Start an Issue discussion→
FAQ
Common questions about Phi-3.5-vision-instruct
Phi-3.5-vision-instruct was released on August 23, 2024 by Microsoft.
Phi-3.5-vision-instruct was created by Microsoft.
Phi-3.5-vision-instruct has 4.2 billion parameters.
Phi-3.5-vision-instruct is released under the MIT license. This is an open-source/open-weight license.
Yes, Phi-3.5-vision-instruct is a multimodal model that can process both text and images as input.