DeepSeekReleased on Apr 23, 2026

DeepSeek-V4-Flash-Max: API Pricing, Context Window & Benchmarks

Name: DeepSeek-V4-Flash-Max
Author: DeepSeek

DeepSeek-V4-Flash-Max is a language model from DeepSeek, released in April 2026, with a 1.0M-token context window, and pricing from $0.100/M input and $0.200/M output.

DeepSeek-V4-Flash-Max is the maximum reasoning effort mode of DeepSeek-V4-Flash, a 284B-parameter MoE model with 13B activated parameters and a 1M-token context window. Sharing the V4 series' hybrid attention architecture (Compressed

Input

Text

Output

Text

Speed

Latency p952.67s

Throughput P9514.1char/s

Cost

$0.11/ 1M · 8:1 in:out

$0.10 in · $0.20 out

DeepSeek-V4-Flash-Max benchmarks

Rankings

Quality Tracker

DeepSeek-V4-Flash-Max Performance Across Datasets

Scores sourced from the model's scorecard, paper, or official blog posts

llm-stats.com - Thu Jul 23 2026

Notice missing or incorrect data?

DeepSeek-V4-Flash-Max pricing

Providers

DeepSeek-V4-Flash-Max starts at $0.100 per million input tokens and $0.200 per million output tokens via DeepInfra. See all 4 providers below with their per-token pricing, latency, throughput, and modality support.

Provider	Input $/M	Output $/M	Context in / out	TTFT p50 / p95 s	Output avg / p5 c/s	Success 7d	Modalities in / out
DeepInfra	$0.100	$0.200	1.0M/65.5K	2.29/3.50	32/14	100.00%(8)	/
DeepSeek	$0.140	$0.280	1.0M/393.2K	1.40/2.67	183/11	100.00%(21)	/
Fireworks	$0.140	$0.280	1.0M/65.5K	0.78/5.66	127/4	100.00%(41)	/
Novita	$0.140	$0.280	1.0M/131.1K	3.15/4.90	110/2	100.00%(15)	/

Cached input is the discounted price for prompt tokens served from a provider cache. TTFT is time to first token. Output is characters per second; p5 is the sustained floor exceeded by 95% of observed requests. Success is calculated from completed versus failed requests over the trailing seven days.

Loading chart...

DeepSeek-V4-Flash-Max model size

DeepSeek-V4-Flash-Max has 284 billion parameters and was trained on 32 trillion tokens. See how it compares to other models in the same parameter range.

ParametersTraining tokens

284BMoE

32Ttokens

113× tokens-to-params ratio

Frontier (200B+)

284B

1B7B70B405B

DeepSeek-V4-Flash-Max context window

Input and output token limits for DeepSeek-V4-Flash-Max, plus how it ranks on long-context understanding.

InputOutput

1.0Mtokens

393Ktokens

≈ 1.6k pages of text

1.0M

8K128K1M

DeepSeek-V4-Flash-Max API

POST/v1/chat/completions

Modeldeepseek-v4-flash-max

API key●

Prompt●

Stream

Run a request to see the response

Use it in your code

Billed at $0.10 input / $0.20 output per 1M tokens through the LLM Stats gateway.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://gateway.llm-stats.com/v1"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash-max",
    messages=[
        {"role": "user", "content": "What is machine learning?"}
    ]
)

print(response.choices[0].message.content)

Need an API key? Create one above in the playground, or read the API documentation.

DeepSeek-V4-Flash-Max latency

DeepSeek-V4-Flash-Max time to first token, sustained output throughput, and failed-request rate from live API traffic over the trailing 7 days.

Provider operational metrics

Time to first token, output throughput, and failed-request rate from live API traffic

Loading chart...

DeepSeek-V4-Flash-Max examples

Recent arena outputs from DeepSeek-V4-Flash-Max, picked from the highest-ranked matchups.

DeepSeek-V4-Flash-Max license

DeepSeek-V4-Flash-Max is released under the MIT license, which permits commercial use, has 284.0B parameters.

License: MIT; Commercial use allowed
Parameters: 284.0B

MIT License - allows commercial use

DeepSeek-V4-Flash-Max resources

Official sources for DeepSeek-V4-Flash-Max: api documentation, official playground, source repository, model weights.

DeepSeek-V4-Flash-Max vs other models

The most-compared alternatives to DeepSeek-V4-Flash-Max are Gemini 3 Pro, Gemini 3 Flash, GPT-5 Medium. Open any pair side-by-side for benchmarks, pricing, context, and latency.

Models like DeepSeek-V4-Flash-Max

Models ranked just above and below DeepSeek-V4-Flash-Max by LLM Stats score.

Gemini 3 Pro

Score pending

Details Compare

Gemini 3 Flash

Score pending

Details Compare

GPT-5 Medium

Score pending

Details Compare

MiMo-V2.5-Pro

Score pending

Details Compare

GPT-5.2

Score pending

Details Compare

GPT-5.1 Thinking

Score pending

Details Compare

FAQ

Common questions about DeepSeek-V4-Flash-Max.

When was DeepSeek-V4-Flash-Max released?

DeepSeek-V4-Flash-Max was released on April 23, 2026 by DeepSeek. This is the official DeepSeek-V4-Flash-Max release date tracked on LLM Stats.

How much does DeepSeek-V4-Flash-Max cost?

DeepSeek-V4-Flash-Max costs $0.10 per million input tokens and $0.20 per million output tokens through the LLM Stats API, which works with any OpenAI-compatible SDK. Across tracked providers, the lowest price is $0.10 per million input tokens via DeepInfra.

Is DeepSeek-V4-Flash-Max available via API?

Yes. DeepSeek-V4-Flash-Max is available through the LLM Stats API and works with any OpenAI-compatible SDK — point your client at the gateway base URL and pass the model name. It is served by 4 providers tracked on LLM Stats.

How big is DeepSeek-V4-Flash-Max?

DeepSeek-V4-Flash-Max has 284 billion parameters. It was trained on 32.0 trillion tokens. It ships as an open-weight model, so you can download and run it on your own hardware.

Who created DeepSeek-V4-Flash-Max?

DeepSeek-V4-Flash-Max was created by DeepSeek.

What is the license for DeepSeek-V4-Flash-Max?

DeepSeek-V4-Flash-Max is released under the MIT license. This is an open-source / open-weight license that permits self-hosting.

What is DeepSeek-V4-Flash-Max latency?

DeepSeek-V4-Flash-Max p95 time to first token is 2.67 seconds via DeepSeek over the trailing 7 days. Lower time to first token means the model begins responding sooner for chat, agents and API workloads.

Where can I use DeepSeek-V4-Flash-Max?

DeepSeek-V4-Flash-Max is available through 4 providers including DeepInfra, DeepSeek, Fireworks, and 1 more.

What models should I compare DeepSeek-V4-Flash-Max against?

Common DeepSeek-V4-Flash-Max comparisons include DeepSeek-V4-Flash-Max vs Gemini 3 Pro, DeepSeek-V4-Flash-Max vs Gemini 3 Flash, DeepSeek-V4-Flash-Max vs GPT-5 Medium. Compare them side by side for benchmark scores, pricing, context window, latency and API availability.