Qwen3 32B
Overview
Qwen3-32B is a large language model from Alibaba's Qwen3 series. It features 32.8 billion parameters, a 128k token context window, support for 119 languages, and hybrid thinking modes allowing switching between deep reasoning and fast responses. It demonstrates strong performance in reasoning, instruction-following, and agent capabilities.
Qwen3 32B was released on April 29, 2025. API access is available through DeepInfra, Novita, Sambanova.
Performance
Timeline
Specifications
Benchmarks
Qwen3 32B Performance Across Datasets
Scores sourced from the model's scorecard, paper, or official blog posts
Pricing
Pricing, performance, and capabilities for Qwen3 32B across different providers:
| Provider | Input ($/M) | Output ($/M) | Max Input | Max Output | Latency (s) | Throughput | Quantization | Input | Output |
|---|---|---|---|---|---|---|---|---|---|
DeepInfra | $0.10 | $0.30 | 128.0K | 128.0K | 1.19 | 26.95 c/s | — | Text Image Audio Video | Text Image Audio Video |
Novita | $0.10 | $0.44 | 128.0K | 128.0K | 0.93 | 32.43 c/s | — | Text Image Audio Video | Text Image Audio Video |
Sambanova | $0.40 | $0.80 | 128.0K | 128.0K | 1.08 | 327.7 c/s | — | Text Image Audio Video | Text Image Audio Video |
Price Comparison for Qwen3 32B
Price per 1M input tokens (USD), lower is better
Throughput Comparison for Qwen3 32B
Tokens per second, higher is better
Latency Comparison for Qwen3 32B
Time to first token (s), lower is better
API Access
API Access Coming Soon
API access for Qwen3 32B will be available soon through our gateway.
Recent Posts
Recent Reviews
FAQ
Common questions about Qwen3 32B
