At a glance

Sambanovapricing, performance & catalog

The citable facts about Sambanova's 1 model — sourced from provider APIs and refreshed continuously.

Lowest price
Qwen3 32B at $0.400 per 1M input tokens
Highest throughput
Qwen3 32B at 328 tokens/s
Lowest latency
Qwen3 32B at 1.08s
Largest context
Qwen3 32B at 128K tokens
Catalog
1 active models from 2 organizations

Most affordable

  1. 1Qwen3 32B$0.400/M

Largest context

  1. 1Qwen3 32B128K

FAQ

Common questions about Sambanova.

What is Sambanova?

Sambanova is an API provider that hosts large language models. Active models: 1; From (input): $0.40 / 1M tok; Avg throughput: 328 tok/s; Avg latency: 1.08 s; Max context: 128K.

How many models does Sambanova offer?

Sambanova currently serves 1 active models out of 6 historical offerings on LLM Stats.

What is Sambanova's API pricing?

Sambanova input pricing starts from $0.40 per 1M tokens, with the most expensive offering at $0.4 per 1M tokens. See the Pricing tab above for the full per-model breakdown.

How fast is Sambanova?

Sambanova averages 328 output tokens per second across its catalog, with average latency of 1.08s. Per-model performance is shown in the Performance tab.

Is Sambanova OpenAI compatible?

Most providers expose an OpenAI-compatible /v1/chat/completions endpoint so you can switch from OpenAI to Sambanova by changing only the base URL and API key. Check https://sambanova.ai/ for the exact endpoint format and any provider-specific parameters.

Whose models does Sambanova host?

Sambanova hosts models from Alibaba Cloud / Qwen Team and Meta. See the Models tab for the full catalog grouped by creator.

How do I start using Sambanova?

Sign up at https://sambanova.ai/ to get an API key, then call Sambanova's API directly from your application. Most clients work out of the box by pointing the OpenAI SDK at Sambanova's base URL with your key. Use the Pricing and Performance tabs above to pick the right model for your latency, cost, and context-window requirements.