DeepSeek-V3
Overview
A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.
DeepSeek-V3 was released on December 25, 2024. API access is available through DeepSeek.
Performance
Timeline
ReleasedUnknown
Knowledge CutoffUnknown
Specifications
Parameters
671.0B
License
MIT + Model License (Commercial use allowed)
Training Data
Unknown
Benchmarks
DeepSeek-V3 Performance Across Datasets
Scores sourced from the model's scorecard, paper, or official blog posts
Notice missing or incorrect data?Start an Issue discussion→
Pricing
Pricing, performance, and capabilities for DeepSeek-V3 across different providers:
| Provider | Input ($/M) | Output ($/M) | Max Input | Max Output | Latency (s) | Throughput | Quantization | Input | Output |
|---|---|---|---|---|---|---|---|---|---|
DeepSeek | $0.27 | $1.10 | 131.1K | 131.1K | 0.5 | 100.0 c/s | — | Text Image Audio Video | Text Image Audio Video |
API Access
API Access Coming Soon
API access for DeepSeek-V3 will be available soon through our gateway.
Recent Posts
Recent Reviews
FAQ
Common questions about DeepSeek-V3
DeepSeek-V3 was released on December 25, 2024.
DeepSeek-V3 has 671.0 billion parameters.
