FAQ
Common questions about Fireworks.
What is Fireworks?
Fireworks is an API provider that hosts large language models. Active models: 14; From (input): $0.10 / 1M tok; Avg throughput: 122 tok/s; Avg latency: 0.66 s; Max context: 262K.
How many models does Fireworks offer?
Fireworks currently serves 14 active models out of 31 historical offerings on LLM Stats.
What is Fireworks's API pricing?
Fireworks input pricing starts from $0.10 per 1M tokens, with the most expensive offering at $0.89 per 1M tokens. See the Pricing tab above for the full per-model breakdown.
How fast is Fireworks?
Fireworks averages 122 output tokens per second across its catalog, with average latency of 0.66s. Per-model performance is shown in the Performance tab.
Does Fireworks support multimodal models?
Yes. Fireworks's catalog includes 1 vision-capable models. See the Models and Capabilities tabs for the full per-model breakdown.
Whose models does Fireworks host?
Fireworks hosts models from DeepSeek, Fireworks AI, MiniMax, Moonshot AI, OpenAI, and Alibaba Cloud / Qwen Team, plus 2 more. See the Models tab for the full catalog grouped by creator.
How do I start using Fireworks?
Sign up at https://fireworks.ai/ to get an API key, then call Fireworks's API directly from your application. Use the Pricing and Performance tabs above to pick the right model for your latency, cost, and context-window requirements.