Qwen logo

Qwen2.5 VL 72B Instruct

Overview

Overview

Qwen2.5-VL is the new flagship vision-language model of Qwen, significantly improved from Qwen2-VL. It excels at recognizing objects, analyzing text/charts/layouts in images, acting as a visual agent, understanding long videos (over 1 hour) with event pinpointing, performing visual localization (bounding boxes/points), and generating structured outputs from documents.

Qwen2.5 VL 72B Instruct was released on January 26, 2025.

Performance

Timeline

ReleasedUnknown
Knowledge CutoffUnknown

Specifications

Parameters
72.0B
License
tongyi-qianwen
Training Data
Unknown
Tags
tuning:instruct

Benchmarks

Benchmarks

Qwen2.5 VL 72B Instruct Performance Across Datasets

Scores sourced from the model's scorecard, paper, or official blog posts

LLM Stats Logollm-stats.com - Thu Jan 22 2026
Notice missing or incorrect data?Start an Issue discussion

Pricing

Pricing

Pricing, performance, and capabilities for Qwen2.5 VL 72B Instruct across different providers:

No pricing information available for this model.

API Access

API Access Coming Soon

API access for Qwen2.5 VL 72B Instruct will be available soon through our gateway.

Recent Posts

Recent Reviews

FAQ

Common questions about Qwen2.5 VL 72B Instruct

Qwen2.5 VL 72B Instruct was released on January 26, 2025 by Qwen.
Qwen2.5 VL 72B Instruct was created by Qwen.
Qwen2.5 VL 72B Instruct has 72.0 billion parameters.
Qwen2.5 VL 72B Instruct is released under the tongyi-qianwen license.
Yes, Qwen2.5 VL 72B Instruct is a multimodal model that can process both text and images as input.