Model Comparison

Phi-4-multimodal-instruct vs QvQ-72B-Preview

QvQ-72B-Preview significantly outperforms across most benchmarks.

Performance Benchmarks

Comparative analysis across standard metrics

2 benchmarks

Phi-4-multimodal-instruct outperforms in 0 benchmarks, while QvQ-72B-Preview is better at 2 benchmarks (MathVista, MMMU).

QvQ-72B-Preview significantly outperforms across most benchmarks.

Tue Apr 14 2026 • llm-stats.com

Arena Performance

Human preference votes

Pricing Analysis

Price comparison per million tokens

Cost data unavailable.

Lowest available price from all providers
Tue Apr 14 2026 • llm-stats.com
Microsoft
Phi-4-multimodal-instruct
Input tokens$0.05
Output tokens$0.10
Best providerDeepinfra
Alibaba Cloud / Qwen Team
QvQ-72B-Preview
Input tokens$0.00
Output tokens$0.00
Best providerUnknown Organization
Notice missing or incorrect data?Start an Issue

Model Size

Parameter count comparison

67.8B diff

QvQ-72B-Preview has 67.8B more parameters than Phi-4-multimodal-instruct, making it 1210.7% larger.

Microsoft
Phi-4-multimodal-instruct
5.6Bparameters
Alibaba Cloud / Qwen Team
QvQ-72B-Preview
73.4Bparameters
5.6B
Phi-4-multimodal-instruct
73.4B
QvQ-72B-Preview

Context Window

Maximum input and output token capacity

Only Phi-4-multimodal-instruct specifies input context (128,000 tokens). Only Phi-4-multimodal-instruct specifies output context (128,000 tokens).

Microsoft
Phi-4-multimodal-instruct
Input128,000 tokens
Output128,000 tokens
Alibaba Cloud / Qwen Team
QvQ-72B-Preview
Input- tokens
Output- tokens
Tue Apr 14 2026 • llm-stats.com

Input Capabilities

Supported data types and modalities

Both Phi-4-multimodal-instruct and QvQ-72B-Preview support multimodal inputs.

They are both capable of processing various types of data, offering versatility in application.

Phi-4-multimodal-instruct

Text
Images
Audio
Video

QvQ-72B-Preview

Text
Images
Audio
Video

License

Usage and distribution terms

Phi-4-multimodal-instruct is licensed under MIT, while QvQ-72B-Preview uses Qwen.

License differences may affect how you can use these models in commercial or open-source projects.

Phi-4-multimodal-instruct

MIT

Open weights

QvQ-72B-Preview

Qwen

Open weights

Release Timeline

When each model was launched

Phi-4-multimodal-instruct was released on 2025-02-01, while QvQ-72B-Preview was released on 2024-12-25.

Phi-4-multimodal-instruct is 1 month newer than QvQ-72B-Preview.

Phi-4-multimodal-instruct

Feb 1, 2025

1.2 years ago

1mo newer
QvQ-72B-Preview

Dec 25, 2024

1.3 years ago

Knowledge Cutoff

When training data ends

Phi-4-multimodal-instruct has a documented knowledge cutoff of 2024-06-01, while QvQ-72B-Preview's cutoff date is not specified.

We can confirm Phi-4-multimodal-instruct's training data extends to 2024-06-01, but cannot make a direct comparison without QvQ-72B-Preview's cutoff date.

Phi-4-multimodal-instruct

Jun 2024

QvQ-72B-Preview

Outputs Comparison

Notice missing or incorrect data?Start an Issue discussion

Key Takeaways

Larger context window (128,000 tokens)
Alibaba Cloud / Qwen Team

QvQ-72B-Preview

View details

Alibaba Cloud / Qwen Team

Higher MathVista score (71.4% vs 62.4%)
Higher MMMU score (70.3% vs 55.1%)

Detailed Comparison

AI Model Comparison Table
Feature
Microsoft
Phi-4-multimodal-instruct
Alibaba Cloud / Qwen Team
QvQ-72B-Preview

FAQ

Common questions about Phi-4-multimodal-instruct vs QvQ-72B-Preview

QvQ-72B-Preview significantly outperforms across most benchmarks. Phi-4-multimodal-instruct is made by Microsoft and QvQ-72B-Preview is made by Alibaba Cloud / Qwen Team. The best choice depends on your use case — compare their benchmark scores, pricing, and capabilities above.
Phi-4-multimodal-instruct scores ScienceQA Visual: 97.5%, DocVQA: 93.2%, MMBench: 86.7%, POPE: 85.6%, OCRBench: 84.4%. QvQ-72B-Preview scores MathVista: 71.4%, MMMU: 70.3%, MathVision: 35.9%, OlympiadBench: 20.4%.
Phi-4-multimodal-instruct supports 128K tokens and QvQ-72B-Preview supports an unknown number of tokens. A larger context window lets you process longer documents, conversations, or codebases in a single request.
Key differences include licensing (MIT vs Qwen). See the full comparison above for benchmark-by-benchmark results.
Phi-4-multimodal-instruct is developed by Microsoft and QvQ-72B-Preview is developed by Alibaba Cloud / Qwen Team.