ElevenLabs

ElevenLabs TTS

ElevenLabs TTS is a text-to-speech platform developed by ElevenLabs that converts written text into natural-sounding audio across 70+ languages. The platform includes multiple speech models — Eleven v3, Eleven Multilingual v2, and Eleven Flash v2.5 — each designed for different use cases, from expressive long-form narration to ultra-low-latency real-time applications. It also supports voice cloning, allowing users to create digital replicas of voices that retain their characteristics across all supported languages. ElevenLabs TTS is well-suited for media companies, audiobook producers, game developers, publishers, and content creators who need scalable multilingual audio output. The platform's conversational AI component supports sub-100ms latency and can integrate with CRMs, payment systems, and telephony platforms, making it applicable for customer-facing voice agent deployments. The context window supports up to 10,000 tokens per request, and the platform accepts voice selection and configuration inputs through its API.

January 2023 10,000 context N/A output
Text to Speech Voice Cloning Multilingual Support Conversational AI Agents Real-Time Speech Speech to Text

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

ElevenLabs

Input Context Window

The number of tokens supported by the input context window.

10,000 tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

N/A tokens

Open Source

Whether the model's code is available for public use.

No

Release Date

When the model was first released.

January 2023

Knowledge Cut-off Date

When the model's knowledge was last updated.

January 2023

API Providers

The providers that offer this model. This is not an exhaustive list.

ElevenLabs

Modalities

Types of data this model can process.

Text Audio

What is ElevenLabs TTS

A fuller summary of positioning, capabilities, and source-specific details for ElevenLabs TTS.

ElevenLabs TTS is a text-to-speech platform developed by ElevenLabs that converts written text into natural-sounding audio across 70+ languages. The platform includes multiple speech models — Eleven v3, Eleven Multilingual v2, and Eleven Flash v2.5 — each designed for different use cases, from expressive long-form narration to ultra-low-latency real-time applications. It also supports voice cloning, allowing users to create digital replicas of voices that retain their characteristics across all supported languages.

ElevenLabs TTS is well-suited for media companies, audiobook producers, game developers, publishers, and content creators who need scalable multilingual audio output. The platform's conversational AI component supports sub-100ms latency and can integrate with CRMs, payment systems, and telephony platforms, making it applicable for customer-facing voice agent deployments. The context window supports up to 10,000 tokens per request, and the platform accepts voice selection and configuration inputs through its API.

Capabilities

What ElevenLabs TTS supports

AI

Text to Speech

Converts text into emotionally expressive audio across 70+ languages, with support for multi-speaker dialogue and long-form content up to 10,000 tokens.

AI

Voice Cloning

Creates digital voice replicas from audio samples in both instant and professional-grade modes, preserving voice characteristics across all supported languages.

AI

Multilingual Support

Generates speech in 70+ languages using a single model, enabling consistent voice identity across different language outputs.

AG

Conversational AI Agents

Deploys voice and chat agents with sub-100ms latency, with integration support for CRMs, telephony platforms, and payment systems.

AI

Real-Time Speech

Eleven Flash v2.5 is optimized for low-latency streaming applications, making it suitable for live conversational and interactive use cases.

AI

Speech to Text

The Scribe v2 model transcribes audio in 90+ languages with speaker diarization, word-level timestamps, and real-time transcription support.

AI

Music Generation

Generates studio-grade music from natural language prompts with control over genre, style, vocals, and song structure.

API

API Integration

Accessible via REST API with voice selection and configuration inputs, supporting programmatic audio generation at scale.

Pricing for ElevenLabs TTS

Primary API pricing shown in the same “quick compare” spirit as the reference page.

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

ElevenLabs

Configuration & Parameters

The configurable options currently documented for this model.

Voice

Voice
Default: 6z4qitu552uH4K9c5vrj
Aria Sarah Laura Charlie George Callum River Liam Charlotte Alice Matilda Will Jessica Eric Chris Brian Daniel Lily Bill

Model

Select
Default: eleven_v3
Eleven v3 Eleven Multilingual v2 Eleven Turbo v2.5 Eleven Turbo v2 Eleven English v1 Eleven Multilingual v1

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Voice Model

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Community discussion

What people think about ElevenLabs TTS

ElevenLabs TTS discussions are most active in r/homeassistant, r/TextToSpeech, r/ElevenLabs. Top Reddit threads cluster around benchmark and model-comparison threads.

The strongest match in this snapshot has 190 upvotes and 42 comments.

r/SillyTavernAI 40 upvotes 22 comments April 17, 2026
My personal setup for sillytavern (Openrouter + Elevenlabs TTS + Comfyui).

Hi everyone, I've been using st for a couple years, and think i've finally reached a point in my RP that i'm pretty pleased with the results (for now lol), and would like to share my setup.

**LLM - Claude Sonnet 4.6 / GLM 4.7 Flash (Openrouter)**

* For the model I use it really depends on how long the RP is (If its super long then my wallet can NOT afford sonnet), if I like the responses a model is giving me, and if it adheres to the image and tts formatting I use. I change my main model A LOT, so I just listed two of my most used ones.
* Also for image captioning I use a separate model, usually just grok4.1-fast.

**IMAGE GEN - ComfyUI + ComfyInject**

* ComfyInject is a plugin that is a GODSEND to those wanting images for every message, consistent image prompting, specific povs based on context, consistent clothes and accessories in images, etc. Totally customizable too, huge shoutout to u/momentobru who originally posted about it here in the subreddit. Github link: [https://github.com/Spadic21/ComfyInject](https://github.com/Spadic21/ComfyInject) . I will say that originally I had issues with the plugin communicating with the comfyui server after a few images, but this on the git page fixed it for me: [https://github.com/Spadic21/ComfyInject/issues/7](https://github.com/Spadic21/ComfyInject/issues/7) .
* I like to use divingIllustriousFlat\_v60VAE.safetensors, because it give a really good anime looking style which imo beats base hassakuxl or illustrious. I Have a 5060ti and it usually takes about 12 seconds to generate an image with 30 steps and (most of the time) 832px x 1216px.

**TTS - Elevenlabs V3**

* I feel like this part is pretty self-explanatory, it's simply just an amazing model, and I went ahead and got the membership so I usually clone the voices of fictional characters (mainly anime characters lol) to use, and it ends up really well.
* A feature I absolutely love is the emotion / sfx generation potential that's included with the V3 model in elevenlabs. When something in brackets "\[\]" is sent to the server to generate audio, it uses some recognition feature to either use the words inside the brackets to change the tone of the sentence afterwards, do almost any sound effect, or add / effect timing and rhythm within the audio generated.
* To utilize this I just add a couple sentences to the prompt explaining how to make use of this, like this: "FOR ALL DIALOGUE, (Text inside quotes), follow the following rules without exception no matter what: Constantly add tags in brackets "\[\]" to enhance the dialogue which is processed through TTS. Tags such as actions "\[falling against wooden floor\]", "\[stuttering\]", and pretty much any sound effect. Tags such as emotions "\[Seducingly\]"," \[Angrily\]", "\[Sad\]". Tags such as pacing / rythym "\[pauses\]", "\[stammers\], "\[rushed\]".Tags such as tone "\[yelling\]", "\[british accent\]", "\[shouts\]", "\[whispers\]". UTILIZE THOSE TAGS TO MAKE AN IMMERSIVE AND REALISTIC TEXT TO SPEECH EXPERIENCE."

Any suggestions or comments are appreciated❤.

Open Reddit thread

Hey all,

I’m Praney, a solo dev. I’m partially dyslexic, so text-to-speech is not just a “nice to have” for me. I use it to read, write, review, and turn long scripts into audio.

I got tired of Elevenlabs TTS tools charging by usage and sending my scripts to someone else’s servers, so I built Vois.so: a local voice AI studio for desktop.

The basic idea is simple:

Write a script → assign voices → generate speech locally → arrange it on a timeline → master/export the final audio.

It started as my personal local ElevenLabs-style alternative, but it has turned into a full production workflow.

What it does:

\- Runs locally on desktop
\- Generates voice audio without uploading scripts to a cloud TTS API
\- Has multiple voice engines for fast, expressive, multilingual, and Omni-style generation
\- Includes a voice library with narrator, host, character, announcer, storyteller, and game-style voices
\- Supports voice cloning from a short sample
\- Lets you build multi-speaker scripts
\- Has a multi-track timeline with crossfades and arrangement tools
\- Includes mastering presets for things like audiobooks, podcasts, YouTube, and general audio
\- Exports finished audio files

The part that may be more relevant to this subreddit:

Vois also has a CLI, so Claude Code, Codex, Cursor, Gemini, etc. can control the app directly.

That means an agent can help with things like:

\- Drafting a podcast script
\- Splitting it into speakers
\- Assigning voices
\- Generating the narration
\- Exporting a finished audio file
\- Building audiobook chapters from longer text

I’m currently using Claude + Vois to build audiobooks and podcasts. Claude helps me structure and edit the scripts, then Vois turns them into finished audio locally.

The animated GIF shows the app in action.

It’s free for personal use to download and use on desktop. I’m not posting pricing here because that’s not really the point of this post.

I’m mainly curious:

If you had a local voice studio that Claude/Codex could control, what would you automate with it?

Audiobooks? Podcast drafts? Game dialogue? Voiceovers for docs/tutorials? Something else?

Full disclosure: I built this myself, so I’m happy to answer questions about the product, the agent workflow, or the local TTS side.

Open Reddit thread
r/openclaw 2 upvotes 8 comments April 13, 2026
ElevenLabs TTS voice messages not coming through on Telegram anymore

**Fix:** The ACPX plugin was disabled. ACPX provides the `reply_dispatch` hook that routes agent responses through the ACP dispatch path — which is where TTS processing actually happens. Without it, responses go through the non-ACP fallback path and TTS tags get silently ignored.

In `openclaw.json`, add `"acpx"` to `plugins.allow` and add `"acpx": { "enabled": true }` to `plugins.entries`, then restart the gateway.

Also heads up: there's a 10-character minimum on TTS text content, so short phrases like "Hi!" will silently skip synthesis even with the fix in place.

\---

Since the update before the latest (was on 2026.4.8, now on 2026.4.11), I get "No response generated. Please try again." whenever my Telegram OpenClaw bot tries to send a voice message via ElevenLabs TTS.

Gateway logs show only sendMessage ok, no sendVoice ever appears. The TTS config looks correct (messages.tts.provider = elevenlabs, auto = tagged, API key present). Claude Code has been investigating for hours, current theory is the ElevenLabs capability plugin isn't loading correctly in the gateway's runtime context despite appearing to work in isolation.

Anyone hit this after a recent update?

Open Reddit thread

https://reddit.com/link/1t5cjgb/video/hl5biaf21jzg1/player

Hey all,

I’m Praney, a solo dev. I’m partially dyslexic, so text-to-speech is not just a “nice to have” for me. I use it to read, write, review, and turn long scripts into audio.

I got tired of cloud TTS tools charging by usage and sending my scripts to someone else’s servers, so I built Vois: a local voice AI studio for desktop.

The basic idea is simple:

Write a script → assign voices → generate speech locally → arrange it on a timeline → master/export the final audio.

It started as my personal local ElevenLabs-style alternative, but it has turned into a full production workflow.

What it does:

\- Runs locally on desktop
\- Generates voice audio without uploading scripts to a cloud TTS API
\- Has multiple voice engines for fast, expressive, multilingual, and Omni-style generation
\- Includes a voice library with narrator, host, character, announcer, storyteller, and game-style voices
\- Supports voice cloning from a short sample
\- Lets you build multi-speaker scripts
\- Has a multi-track timeline with crossfades and arrangement tools
\- Includes mastering presets for things like audiobooks, podcasts, YouTube, and general audio
\- Exports finished audio files

The part that may be more relevant to this subreddit:

Vois also has a CLI, so Claude Code, Codex, Cursor, Gemini, etc. can control the app directly.

That means an agent can help with things like:

\- Drafting a podcast script
\- Splitting it into speakers
\- Assigning voices
\- Generating the narration
\- Exporting a finished audio file
\- Building audiobook chapters from longer text

I’m currently using Claude/Codex + Vois to build audiobooks and podcasts. Claude or Codex helps me structure and edit the scripts, then Vois turns them into finished audio locally.

The animated GIF shows the app in action.

It’s free for personal use to download and use on desktop. I’m not posting pricing here because that’s not really the point of this post.

If you like to subscribe you can get $90 OFF our yearly sub - use "VOISNFRIENDS90OFF" (Sorry only 50 codes available).

I’m mainly curious:

If you had a local voice studio that Claude/Codex could control, what would you automate with it?

Audiobooks? Podcast drafts? Game dialogue? Voiceovers for docs/tutorials? Something else?

Full disclosure: I built this myself, so I’m happy to answer questions about the product, the agent workflow, or the local TTS side.

My LinkedIn: [https://www.linkedin.com/in/praney-behl-b9129313](https://www.linkedin.com/in/praney-behl-b9129313/)

Website: [vois.so](https://vois.so)

Open Reddit thread

Tired of generic weather data? 🥱 I wanted my smart home to give me actually useful weather insights, inspired by how the Samsung Weather app tells you specific things like "rain stopping in 2 hours" or "snow likely to continue" ❄️

Instead of just showing "it's snowing, -2°C," my setup now:
- 🔍 Scrapes detailed weather insights from Weather.com
- 🗣️ Announces changes through Sonos speakers using Elevenlabs TTS for natural voice
- 📟 Displays current insight on my Awtrix Matrix
- ⏰ Only announces between 6:30 AM - 8:00 PM when motion is detected
- ⏱️ Has a 5-minute cooldown between announcements

"Snow likely for the next several hours" notification + Awtrix display"

Configuration in comments! 👇 Let me know if you'd like me to share the multiscrape config or automation yaml.

Open Reddit thread
View more discussions →
FAQ

Common questions about ElevenLabs TTS

What is the context window for ElevenLabs TTS?

ElevenLabs TTS supports a context window of up to 10,000 tokens per request.

How many languages does ElevenLabs TTS support?

The text-to-speech models support 70+ languages. The Scribe v2 speech-to-text model extends transcription support to 90+ languages.

What speech models are available through ElevenLabs?

ElevenLabs offers several models including Eleven v3 for expressive storytelling, Eleven Multilingual v2 for broad language coverage, and Eleven Flash v2.5 for ultra-low-latency real-time applications.

Where can I find pricing information for the ElevenLabs API?

API pricing details are available on the ElevenLabs API pricing page at elevenlabs.io/pricing/api.

What is the training data cutoff for ElevenLabs TTS?

According to the available metadata, the training date for ElevenLabs TTS is listed as January 2023.

Does ElevenLabs TTS support voice cloning?

Yes. The platform supports both instant and professional-grade voice cloning, which maintains the cloned voice's characteristics across all supported languages.

More models from ElevenLabs

Continue browsing adjacent models from the same provider.

← All AI Models