Question 1

What is DeepInfra?

Accepted Answer

DeepInfra is an API provider that hosts large language models. Active models: 16; From (input): $0.06 / 1M tok; Avg throughput: 77 tok/s; Avg latency: 0.82 s; Max context: 1.0M.

Question 2

How many models does DeepInfra offer?

Accepted Answer

DeepInfra currently serves 16 active models out of 45 historical offerings on LLM Stats.

Question 3

What is DeepInfra's API pricing?

Accepted Answer

DeepInfra input pricing starts from $0.06 per 1M tokens, with the most expensive offering at $1.74 per 1M tokens. See the Pricing tab above for the full per-model breakdown.

Question 4

How fast is DeepInfra?

Accepted Answer

DeepInfra averages 77 output tokens per second across its catalog, with average latency of 0.82s. Per-model performance is shown in the Performance tab.

Question 5

Does DeepInfra support multimodal models?

Accepted Answer

Yes. DeepInfra's catalog includes 9 vision-capable models. See the Models and Capabilities tabs for the full per-model breakdown.

Question 6

Whose models does DeepInfra host?

Accepted Answer

DeepInfra hosts models from DeepSeek, NVIDIA, OpenAI, Alibaba Cloud / Qwen Team, Zhipu AI, and Google, plus 5 more. See the Models tab for the full catalog grouped by creator.

Question 7

How do I start using DeepInfra?

Accepted Answer

Sign up at https://deepinfra.com/ to get an API key, then call DeepInfra's API directly from your application. Use the Pricing and Performance tabs above to pick the right model for your latency, cost, and context-window requirements.