FAQ
Common questions about DeepInfra.
What is DeepInfra?
DeepInfra is an API provider that hosts large language models. Active models: 16; From (input): $0.06 / 1M tok; Avg throughput: 77 tok/s; Avg latency: 0.82 s; Max context: 1.0M.
How many models does DeepInfra offer?
DeepInfra currently serves 16 active models out of 45 historical offerings on LLM Stats.
What is DeepInfra's API pricing?
DeepInfra input pricing starts from $0.06 per 1M tokens, with the most expensive offering at $1.74 per 1M tokens. See the Pricing tab above for the full per-model breakdown.
How fast is DeepInfra?
DeepInfra averages 77 output tokens per second across its catalog, with average latency of 0.82s. Per-model performance is shown in the Performance tab.
Does DeepInfra support multimodal models?
Yes. DeepInfra's catalog includes 9 vision-capable models. See the Models and Capabilities tabs for the full per-model breakdown.
Whose models does DeepInfra host?
DeepInfra hosts models from DeepSeek, NVIDIA, OpenAI, Alibaba Cloud / Qwen Team, Zhipu AI, and Google, plus 5 more. See the Models tab for the full catalog grouped by creator.
How do I start using DeepInfra?
Sign up at https://deepinfra.com/ to get an API key, then call DeepInfra's API directly from your application. Use the Pricing and Performance tabs above to pick the right model for your latency, cost, and context-window requirements.