CartesiaReleased on Dec 1, 2024

Ink-Whisper: Benchmarks, Pricing & Context Window

Name: Ink-Whisper
Author: Cartesia

Ink-Whisper is a speech-to-text model from Cartesia, released in December 2024.

Cartesia Ink-Whisper STT model with streaming and batch support

Input

Audio

Output

Text

Speed

—

Cost

$2.31/ 1M · 8:1 in:out

$2.60 in · $0.00 out

Ink-Whisper pricing

Providers

Ink-Whisper starts at $2.60 per million input tokens via Cartesia.

Provider	Input $/M	Output $/M	Max Input	Max Output	Latency s	Throughput	Quant	Input	Output
Cartesia	$2.60	—	—	—	—	—	—

Ink-Whisper API

POST/v1/stt/transcribe

Modelink-whisper

API key●

Audio file●

Choose a file

Any audio format up to 25 MB.

Language (ISO 639-1)

Missing Audio file

Run a request to see the response

Use it in your code

OpenAI-compatible endpoint through the LLM Stats gateway.

import requests

with open("audio.mp3", "rb") as f:
    response = requests.post(
        "https://gateway.llm-stats.com/v1/stt/transcribe",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        files={"file": f},
        data={"model_id": "ink-whisper"},
    )

print(response.json()["text"])

Need an API key? Create one above in the playground, or read the API documentation.

Ink-Whisper license

Ink-Whisper is released under the Proprietary license, which restricts commercial use.

License: Proprietary; Non-commercial

Proprietary license - usage restrictions apply

FAQ

Common questions about Ink-Whisper.

Who created Ink-Whisper?

Ink-Whisper was created by Cartesia.

What is the license for Ink-Whisper?

Ink-Whisper is released under the Proprietary license.

Is Ink-Whisper multimodal?

Yes, Ink-Whisper is a multimodal model that can process both text and images as input.