At a glance

Fireworkspricing, performance & catalog

The citable facts about Fireworks's 7 models — sourced from provider APIs and refreshed continuously.

Lowest price
GPT OSS 120B High at $0.150 per 1M input tokens
Highest throughput
Qwen3 30B A3B at 122 tokens/s
Lowest latency
Qwen3 30B A3B at 0.66s
Largest context
Kimi K2.6 at 262K tokens
Catalog
7 active models from 8 organizations

FAQ

Common questions about Fireworks.

What is Fireworks?

Fireworks is an API provider that hosts large language models. Active models: 7; From (input): $0.15 / 1M tok; Avg throughput: 122 tok/s; Avg latency: 0.66 s; Max context: 262K.

How many models does Fireworks offer?

Fireworks currently serves 7 active models out of 32 historical offerings on LLM Stats.

What is Fireworks's API pricing?

Fireworks input pricing starts from $0.15 per 1M tokens, with the most expensive offering at $0.95 per 1M tokens. See the Pricing tab above for the full per-model breakdown.

How fast is Fireworks?

Fireworks averages 122 output tokens per second across its catalog, with average latency of 0.66s. Per-model performance is shown in the Performance tab.

Is Fireworks OpenAI compatible?

Most providers expose an OpenAI-compatible /v1/chat/completions endpoint so you can switch from OpenAI to Fireworks by changing only the base URL and API key. Check https://fireworks.ai/ for the exact endpoint format and any provider-specific parameters.

Does Fireworks support multimodal models?

Yes. Fireworks's catalog includes 1 vision-capable models. See the Models and Capabilities tabs for the full per-model breakdown.

Whose models does Fireworks host?

Fireworks hosts models from Fireworks AI, MiniMax, Moonshot AI, OpenAI, Alibaba Cloud / Qwen Team, and DeepSeek, plus 2 more. See the Models tab for the full catalog grouped by creator.

How do I start using Fireworks?

Sign up at https://fireworks.ai/ to get an API key, then call Fireworks's API directly from your application. Most clients work out of the box by pointing the OpenAI SDK at Fireworks's base URL with your key. Use the Pricing and Performance tabs above to pick the right model for your latency, cost, and context-window requirements.