When was Qwen3 32B released?

Qwen3 32B was released on April 29, 2025 by Qwen. This is the official Qwen3 32B release date tracked on LLM Stats.

How much does Qwen3 32B cost?

Qwen3 32B costs $0.10 per million input tokens and $0.30 per million output tokens through the LLM Stats API, which works with any OpenAI-compatible SDK. Across tracked providers, the lowest price is $0.10 per million input tokens via DeepInfra.

Is Qwen3 32B available via API?

Yes. Qwen3 32B is available through the LLM Stats API and works with any OpenAI-compatible SDK — point your client at the gateway base URL and pass the model name. It is served by 3 providers tracked on LLM Stats.

How big is Qwen3 32B?

Qwen3 32B has 32.8 billion parameters. It ships as an open-weight model, so you can download and run it on your own hardware.

What is the license for Qwen3 32B?

Qwen3 32B is released under the Apache 2.0 license. This is an open-source / open-weight license that permits self-hosting.

What is Qwen3 32B latency?

Qwen3 32B p95 time to first token is 0.93 seconds via Novita over the trailing 7 days. Lower time to first token means the model begins responding sooner for chat, agents and API workloads.

Where can I use Qwen3 32B?

Qwen3 32B is available through 3 providers including DeepInfra, Novita, Sambanova.

Where is the Qwen3 32B paper or technical report?

Qwen3 32B has a paper or technical report available at https://qwenlm.github.io/blog/qwen3/. Use that source for architecture, training, release and evaluation details.

What models should I compare Qwen3 32B against?

Common Qwen3 32B comparisons include Qwen3 32B vs Llama 3.1 Nemotron Ultra 253B v1, Qwen3 32B vs Phi 4 Reasoning Plus, Qwen3 32B vs GPT-5.2. Compare them side by side for benchmark scores, pricing, context window, latency and API availability.

Qwen3 32B API Pricing, Context Window & Benchmarks

Name: Qwen3 32B
Author: Qwen

Qwen3 32B benchmarks

Rankings

Quality Tracker

Qwen3 32B Performance Across Datasets

Scores sourced from the model's scorecard, paper, or official blog posts

llm-stats.com - Mon Aug 03 2026

Notice missing or incorrect data?

Qwen3 32B pricing

Providers

Qwen3 32B starts at $0.100 per million input tokens and $0.300 per million output tokens via DeepInfra. See all 3 providers below with their per-token pricing, latency, throughput, and modality support.

Provider	Input $/M	Output $/M	Context in / out	TTFT p50 / p95 s	Output avg / p5 c/s	Success 7d	Modalities in / out
DeepInfra	$0.100	$0.300	128.0K/128.0K	0.00/0.00	113/94	100.00%(3)	/
Novita	$0.100	$0.440	128.0K/128.0K	0.00/0.00	—/—	0.00%(2)	/
Sambanova	$0.400	$0.800	128.0K/128.0K	—/1.08	328/—	—	/

Cached input is the discounted price for prompt tokens served from a provider cache. TTFT is time to first token. Output is characters per second; p5 is the sustained floor exceeded by 95% of observed requests. Success is calculated from completed versus failed requests over the trailing seven days.

Loading chart...

Qwen3 32B model size

Qwen3 32B has 32.8 billion parameters. See how it compares to other models in the same parameter range.

Parameters

32.8B

Large (30–80B)

32.8B

1B7B70B405B

Qwen3 32B context window

Input and output token limits for Qwen3 32B, plus how it ranks on long-context understanding.

InputOutput

128Ktokens

≈ 192 pages of text

128K

8K128K1M

Qwen3 32B API

POST/v1/chat/completions

Modelqwen3-32b

API key●

Prompt●

Stream

Run a request to see the response

Use it in your code

Billed at $0.10 input / $0.30 output per 1M tokens through the LLM Stats gateway.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://gateway.llm-stats.com/v1"
)

response = client.chat.completions.create(
    model="qwen3-32b",
    messages=[
        {"role": "user", "content": "What is machine learning?"}
    ]
)

print(response.choices[0].message.content)

Need an API key? Create one above in the playground, or read the API documentation.

Qwen3 32B latency

Qwen3 32B time to first token, sustained output throughput, and failed-request rate from live API traffic over the trailing 7 days.

Provider operational metrics

Time to first token, output throughput, and failed-request rate from live API traffic

Loading chart...

Qwen3 32B examples

Recent arena outputs from Qwen3 32B, picked from the highest-ranked matchups.

Qwen3 32B license

Qwen3 32B is released under the Apache 2.0 license, which permits commercial use, has 32.8B parameters.

License: Apache 2.0; Commercial use allowed
Parameters: 32.8B

Apache License 2.0 - allows commercial use

Qwen3 32B resources

Official sources for Qwen3 32B: official playground, official launch post, source repository, model weights.

Qwen3 32B vs other models

The most-compared alternatives to Qwen3 32B are Llama 3.1 Nemotron Ultra 253B v1, Phi 4 Reasoning Plus, GPT-5.2. Open any pair side-by-side for benchmarks, pricing, context, and latency.