Qwen logo

Qwen3-Next-80B-A3B-Base

Overview

Qwen3-Next-80B-A3B-Base is the foundation model in the Qwen3-Next series, featuring revolutionary architectural innovations for ultimate training and inference efficiency. It introduces Hybrid Attention combining Gated DeltaNet (75% layers) and Gated Attention (25% layers) for efficient ultra-long context modeling, Ultra-Sparse MoE with 512 total experts but only 10 routed + 1 shared expert activated (3.7% activation ratio), and native Multi-Token Prediction for faster inference. With 80B total parameters and only ~3B activated per inference step, it achieves performance comparable to Qwen3-32B while using less than 10% training cost and delivering 10x+ throughput for 32K+ contexts. Trained on 15T tokens with training-stability-friendly designs including Zero-Centered RMSNorm and normalized MoE router parameters. Supports 256K context length, extensible to 1M tokens with YaRN scaling.

Qwen3-Next-80B-A3B-Base was released on September 10, 2025.

Performance

Timeline

ReleasedUnknown
Knowledge CutoffUnknown

Specifications

Parameters
80.0B
License
Apache 2.0
Training Data
Unknown
Tags
tuning:base

Benchmarks

No data available
Notice missing or incorrect data?Start an Issue discussion

Pricing

Pricing, performance, and capabilities for Qwen3-Next-80B-A3B-Base across different providers:

No pricing information available for this model.

API Access

API Access Coming Soon

API access for Qwen3-Next-80B-A3B-Base will be available soon through our gateway.

Recent Posts

Recent Reviews

FAQ

Common questions about Qwen3-Next-80B-A3B-Base

Qwen3-Next-80B-A3B-Base was released on September 10, 2025.
Qwen3-Next-80B-A3B-Base has 80.0 billion parameters.