FAQ

Common questions about DeepInfra.

What is DeepInfra?

DeepInfra is an API provider that hosts large language models. Active models: 16; From (input): $0.06 / 1M tok; Avg throughput: 77 tok/s; Avg latency: 0.82 s; Max context: 1.0M.

How many models does DeepInfra offer?

DeepInfra currently serves 16 active models out of 45 historical offerings on LLM Stats.

What is DeepInfra's API pricing?

DeepInfra input pricing starts from $0.06 per 1M tokens, with the most expensive offering at $1.74 per 1M tokens. See the Pricing tab above for the full per-model breakdown.

How fast is DeepInfra?

DeepInfra averages 77 output tokens per second across its catalog, with average latency of 0.82s. Per-model performance is shown in the Performance tab.

Does DeepInfra support multimodal models?

Yes. DeepInfra's catalog includes 9 vision-capable models. See the Models and Capabilities tabs for the full per-model breakdown.

Whose models does DeepInfra host?

DeepInfra hosts models from DeepSeek, NVIDIA, OpenAI, Alibaba Cloud / Qwen Team, Zhipu AI, and Google, plus 5 more. See the Models tab for the full catalog grouped by creator.

How do I start using DeepInfra?

Sign up at https://deepinfra.com/ to get an API key, then call DeepInfra's API directly from your application. Use the Pricing and Performance tabs above to pick the right model for your latency, cost, and context-window requirements.