Qwen3 VL 4B Thinking: Pricing, Context Window, Benchmarks, and More

Name: Qwen3 VL 4B Thinking
Author: Qwen

Overview

Qwen3-VL is a large multimodal model that unifies vision, language, and reasoning to achieve human-level perception and cognition across text, images, and video. Built on a 235B-parameter architecture, it integrates early joint training of visual and textual modalities for strong language grounding. The model supports up to a 1 million-token context window and excels at visual understanding, spatial reasoning, long video comprehension, and tool-based interaction. It can generate code from images, perform precise 2D/3D object grounding, and operate digital interfaces like a visual agent. The “Instruct” version rivals Gemini 2.5 Pro in perception benchmarks, while the “Thinking” version leads in multimodal reasoning and STEM tasks. With multilingual OCR, creative writing, and fine-grained scene interpretation, Qwen3-VL establishes a new open-source frontier for integrated vision-language intelligence.

Qwen3 VL 4B Thinking was released on September 22, 2025. API access is available through DeepInfra.

Performance

Timeline

ReleasedUnknown

Knowledge CutoffUnknown

Specifications

Parameters

4.0B

License

Apache 2.0

Training Data

Unknown

Benchmarks

Qwen3 VL 4B Thinking Performance Across Datasets

Scores sourced from the model's scorecard, paper, or official blog posts

llm-stats.com - Sat Feb 21 2026

Notice missing or incorrect data?Start an Issue discussion→

Pricing

Pricing, performance, and capabilities for Qwen3 VL 4B Thinking across different providers:

Provider	Input ($/M)	Output ($/M)	Max Input	Max Output	Latency (s)	Throughput	Quantization	Input	Output
DeepInfrafp8	$0.10	$1.00	262.1K	262.1K	—	—	fp8	Text Image Audio Video	Text Image Audio Video

API Access

API Access Coming Soon

API access for Qwen3 VL 4B Thinking will be available soon through our gateway.

Recent Reviews

FAQ

Common questions about Qwen3 VL 4B Thinking

Qwen3 VL 4B Thinking was released on September 22, 2025 by Qwen.

Qwen3 VL 4B Thinking was created by Qwen.

Qwen3 VL 4B Thinking has 4.0 billion parameters.

Qwen3 VL 4B Thinking is released under the Apache 2.0 license. This is an open-source/open-weight license.

Yes, Qwen3 VL 4B Thinking is a multimodal model that can process both text and images as input.

Qwen3 VL 4B Thinking

Overview

Performance

Timeline

Specifications

Benchmarks

Qwen3 VL 4B Thinking Performance Across Datasets

Pricing

API Access

Recent Posts

Recent Reviews

FAQ

When was Qwen3 VL 4B Thinking released?

Who created Qwen3 VL 4B Thinking?

How many parameters does Qwen3 VL 4B Thinking have?

What is the license for Qwen3 VL 4B Thinking?

Is Qwen3 VL 4B Thinking multimodal?