DeepSeek logo

DeepSeek-V3

Overview

A powerful Mixture-of-Experts (MoE) language model with 671B total parameters (37B activated per token). Features Multi-head Latent Attention (MLA), auxiliary-loss-free load balancing, and multi-token prediction training. Pre-trained on 14.8T tokens with strong performance in reasoning, math, and code tasks.

DeepSeek-V3 was released on December 25, 2024. API access is available through DeepSeek.

Performance

Timeline

ReleasedUnknown
Knowledge CutoffUnknown

Specifications

Parameters
671.0B
License
MIT + Model License (Commercial use allowed)
Training Data
Unknown

Benchmarks

DeepSeek-V3 Performance Across Datasets

Scores sourced from the model's scorecard, paper, or official blog posts

LLM Stats Logollm-stats.com - Fri Jan 02 2026
Notice missing or incorrect data?Start an Issue discussion

Pricing

Pricing, performance, and capabilities for DeepSeek-V3 across different providers:

ProviderInput ($/M)Output ($/M)Max InputMax OutputLatency (s)ThroughputQuantizationInputOutput
DeepSeek logo
DeepSeek
$0.27$1.10131.1K131.1K
0.5
100.0 c/s
Text
Image
Audio
Video
Text
Image
Audio
Video

API Access

API Access Coming Soon

API access for DeepSeek-V3 will be available soon through our gateway.

Recent Posts

Recent Reviews

FAQ

Common questions about DeepSeek-V3

DeepSeek-V3 was released on December 25, 2024.
DeepSeek-V3 has 671.0 billion parameters.