CartesiaReleased on Sep 1, 2024

Sonic English: Benchmarks, Pricing & Context Window

Sonic English is a text-to-speech model from Cartesia, released in September 2024.

Cartesia Sonic English TTS model

Input
Text
Output
Audio

Sonic English pricing

Providers

Sonic English starts at $10.00 per million input tokens via Cartesia.

ProviderInput $/MOutput $/MMax InputMax OutputLatency sThroughputQuantInputOutput
Cartesia logoCartesia
$10.002.5K

Sonic English API

POST/v1/tts/synthesize

Run a request to see the response

Use it in your code

OpenAI-compatible endpoint through the LLM Stats gateway.

import requests

response = requests.post(
    "https://gateway.llm-stats.com/v1/tts/synthesize",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model_id": "sonic-english",
        "text": "Hello, this is a test.",
        "format": "mp3",
        "sample_rate": 24000,
    },
)

with open("output.mp3", "wb") as f:
    f.write(response.content)

Need an API key? Create one above in the playground, or read the API documentation.

Sonic English license

Sonic English is released under the Proprietary license, which restricts commercial use.

License
Proprietary
Non-commercial

Proprietary license - usage restrictions apply

FAQ

Common questions about Sonic English.

Who created Sonic English?

Sonic English was created by Cartesia.

What is the license for Sonic English?

Sonic English is released under the Proprietary license.

Is Sonic English multimodal?

Yes, Sonic English is a multimodal model that can process both text and images as input.