Phi-4-multimodal-instruct vs QvQ-72B-Preview Comparison

Comparing Phi-4-multimodal-instruct and QvQ-72B-Preview across benchmarks, pricing, and capabilities.

Performance Benchmarks

Comparative analysis across standard metrics

2 benchmarks

Phi-4-multimodal-instruct outperforms in 0 benchmarks, while QvQ-72B-Preview is better at 2 benchmarks (MathVista, MMMU).

QvQ-72B-Preview significantly outperforms across most benchmarks.

Sat Mar 14 2026 • llm-stats.com

Arena Performance

Human preference votes

Pricing Analysis

Price comparison per million tokens

Cost data unavailable.

Lowest available price from all providers
Sat Mar 14 2026 • llm-stats.com
Microsoft
Phi-4-multimodal-instruct
Input tokens$0.05
Output tokens$0.10
Best providerDeepinfra
Alibaba Cloud / Qwen Team
QvQ-72B-Preview
Input tokens$0.00
Output tokens$0.00
Best providerUnknown Organization
Notice missing or incorrect data?Start an Issue

Model Size

Parameter count comparison

67.8B diff

QvQ-72B-Preview has 67.8B more parameters than Phi-4-multimodal-instruct, making it 1210.7% larger.

Microsoft
Phi-4-multimodal-instruct
5.6Bparameters
Alibaba Cloud / Qwen Team
QvQ-72B-Preview
73.4Bparameters
5.6B
Phi-4-multimodal-instruct
73.4B
QvQ-72B-Preview

Context Window

Maximum input and output token capacity

Only Phi-4-multimodal-instruct specifies input context (128,000 tokens). Only Phi-4-multimodal-instruct specifies output context (128,000 tokens).

Microsoft
Phi-4-multimodal-instruct
Input128,000 tokens
Output128,000 tokens
Alibaba Cloud / Qwen Team
QvQ-72B-Preview
Input- tokens
Output- tokens
Sat Mar 14 2026 • llm-stats.com

Input Capabilities

Supported data types and modalities

Both Phi-4-multimodal-instruct and QvQ-72B-Preview support multimodal inputs.

They are both capable of processing various types of data, offering versatility in application.

Phi-4-multimodal-instruct

Text
Images
Audio
Video

QvQ-72B-Preview

Text
Images
Audio
Video

License

Usage and distribution terms

Phi-4-multimodal-instruct is licensed under MIT, while QvQ-72B-Preview uses Qwen.

License differences may affect how you can use these models in commercial or open-source projects.

Phi-4-multimodal-instruct

MIT

Open weights

QvQ-72B-Preview

Qwen

Open weights

Release Timeline

When each model was launched

Phi-4-multimodal-instruct was released on 2025-02-01, while QvQ-72B-Preview was released on 2024-12-25.

Phi-4-multimodal-instruct is 1 month newer than QvQ-72B-Preview.

Phi-4-multimodal-instruct

Feb 1, 2025

1.1 years ago

1mo newer
QvQ-72B-Preview

Dec 25, 2024

1.2 years ago

Knowledge Cutoff

When training data ends

Phi-4-multimodal-instruct has a documented knowledge cutoff of 2024-06-01, while QvQ-72B-Preview's cutoff date is not specified.

We can confirm Phi-4-multimodal-instruct's training data extends to 2024-06-01, but cannot make a direct comparison without QvQ-72B-Preview's cutoff date.

Phi-4-multimodal-instruct

Jun 2024

QvQ-72B-Preview

Outputs Comparison

Notice missing or incorrect data?Start an Issue discussion

Key Takeaways

Larger context window (128,000 tokens)
Alibaba Cloud / Qwen Team

QvQ-72B-Preview

View details

Alibaba Cloud / Qwen Team

Higher MathVista score (71.4% vs 62.4%)
Higher MMMU score (70.3% vs 55.1%)

Detailed Comparison

AI Model Comparison Table
Feature
Microsoft
Phi-4-multimodal-instruct
Alibaba Cloud / Qwen Team
QvQ-72B-Preview