OpenAI

TTS

TTS (tts-1) is OpenAI's text-to-speech model designed for speed and responsiveness. It converts written text into natural-sounding audio and is optimized to minimize the delay between text input and audio output. The model supports a 4096-token context window and is accessible through the OpenAI API, making it straightforward to integrate into existing applications and workflows. TTS is well-suited for use cases where timely audio delivery matters, such as interactive voice assistants, customer service systems, educational tools, and entertainment applications. OpenAI also offers a sibling model, tts-1-hd, which prioritizes audio fidelity over speed. Developers who need the fastest possible voice response times will find tts-1 the appropriate choice, while those who can tolerate slightly higher latency in exchange for higher audio quality may opt for tts-1-hd.

November 2024 N/A context N/A output

Low-Latency Speech Natural Voice Output Multiple Audio Formats Text Input Processing API Integration Speed Control

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Parameters ↓ Tools ↓ Daily ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

OpenAI

Input Context Window

The number of tokens supported by the input context window.

N/A tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

N/A tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

November 2024

Knowledge Cut-off Date

When the model's knowledge was last updated.

November 2024

API Providers

The providers that offer this model. This is not an exhaustive list.

OpenAI API

Modalities

Types of data this model can process.

Text Audio

What is TTS

A fuller summary of positioning, capabilities, and source-specific details for TTS.

TTS (tts-1) is OpenAI's text-to-speech model designed for speed and responsiveness. It converts written text into natural-sounding audio and is optimized to minimize the delay between text input and audio output. The model supports a 4096-token context window and is accessible through the OpenAI API, making it straightforward to integrate into existing applications and workflows.

TTS is well-suited for use cases where timely audio delivery matters, such as interactive voice assistants, customer service systems, educational tools, and entertainment applications. OpenAI also offers a sibling model, tts-1-hd, which prioritizes audio fidelity over speed. Developers who need the fastest possible voice response times will find tts-1 the appropriate choice, while those who can tolerate slightly higher latency in exchange for higher audio quality may opt for tts-1-hd.

Capabilities

What TTS supports

Low-Latency Speech

Generates audio from text with minimal delay, making it suitable for near real-time voice applications like interactive assistants.

Natural Voice Output

Produces fluid, human-like speech from written text across a range of supported voices including alloy, echo, fable, onyx, nova, and shimmer.

AUD

Multiple Audio Formats

Outputs audio in several formats including MP3, Opus, AAC, and FLAC, allowing developers to choose the format that fits their delivery requirements.

Text Input Processing

Accepts plain text input up to 4096 tokens per request and converts it to spoken audio in a single API call.

API

API Integration

Available via the OpenAI REST API, enabling scalable voice output that can be embedded into products, pipelines, and third-party platforms.

Speed Control

Supports a configurable speech speed parameter ranging from 0.25x to 4.0x, giving developers control over the pacing of generated audio.

Pricing for TTS

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens $15.00 Per million tokens

Output tokens N/A Per million tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

OpenAI API

Configuration & Parameters

The configurable options currently documented for this model.

Voice

Select

Voice to use in TTS

Default: alloy

Alloy Echo Fable Onyx Nova Shimmer

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Voice

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Official Model Listing Documentation

→

Audio & Speech Guide Documentation

→

OpenAI API Platform Other

→

OpenAI TTS Pricing Documentation

→

OpenAI API Reference – Audio Documentation

→

Official Website

→

Usage Policies

→

Enterprise privacy at OpenAI

→

OpenAI Status Page

→

AI tools related to TTS

These tools are strongly connected to TTS through direct product references, provider mentions, or explicit model mappings.

AI Image Generator

SEO Writing AI

SEO Writing AI is an AI-powered writing platform designed to create SEO-optimized articles, blog posts, and affiliate content with a single click. It enables users to generate content in bulk and auto-publish directly to WordPress. By analyzing top-ranking search results and extracting relevant calls-to-action, the platform produces ready-to-publish pages. Key features include long-form content generation, product listing creation, SEO optimization tools, and specialized models for affiliate marketing content.

Free 120 visits 11 saves

AI Voice Generator

Soundify

Soundify is an AI-powered sound effects generator designed to help you create custom audio for your projects. Whether you require background music, ambient soundscapes, or specific sound effects, Soundify generates unique audio clips based on your descriptive text prompts.

Free 0 visits 9 saves

AI Assistant

GPT Omni

GPT Omni (gptomni.ai) offers a free, accessible web interface for interacting with the GPT-4o model. Designed for ease of use, it allows users to engage in AI conversations without technical requirements. By leveraging OpenAI's GPT-4o, the platform supports text, audio, and visual inputs, providing real-time audio responses, improved multilingual capabilities, and advanced vision features to make AI technology widely available.

Free 0 visits 7 saves

AI Assistant

Tactiq

Tactiq is an AI meeting assistant that provides live transcription, AI-generated summaries, action items, and custom prompts for Google Meet, Zoom, and Microsoft Teams. It enables users to focus on their conversations while the AI manages note-taking, summarizes discussions, and identifies actionable workflows.

Free 5 visits 7 saves

Related Daily Briefs

Recent daily stories tied to TTS through direct model mentions or provider-level coverage.

Frontier Models

Anthropic Opus 5 Nears Fable 5 as Midjourney V8.2 Lands and OpenAI Agents Gain Web Access

NVIDIA and Hugging Face move deeper into real workflows.

2026-07-24 AI Models Security

Agents Workflows

OpenAI launches Building AI; OpenAI launches Enterprise AI Agents; Cohere launches Synthetic media labels

OpenAI and Hugging Face move deeper into real workflows.

2026-07-22 AI API AI Agent

Frontier Models

Anthropic, Alibaba, and OpenAI Signal a Broader Shift Around Economic Index

Anthropic and Qwen move deeper into real workflows.

2026-07-22 AI Models AI API

Frontier Models

OpenAI and Moonshot AI Signal a Broader Shift Around Codex

Hugging Face and OpenAI move deeper into real workflows.

2026-07-21 AI Models Partnership

Community discussion

What people think about TTS

TTS discussions are most active in r/Grimdank, r/LocalLLaMA, r/Rainbow6. Top Reddit threads cluster around benchmark and model-comparison threads. The strongest match in this snapshot has 19242 upvotes and 442 comments.

r/LocalLLaMA 21 upvotes 37 comments May 2, 2026

What is The best and expressive AI TTS (running locally?) for voice acting?

I am only doing this for private hobby projects.But I haven’t been up to date with the best TTS? Which one is it?

The ones that can show all types of emotions including grunts, etc, anger, screams, sadness.

Open Reddit thread

r/LocalLLaMA 589 upvotes 118 comments April 22, 2026

Qwen3 TTS is seriously underrated - I got it running locally in real-time and it's one of the most expressive open TTS models I've tried

Heya guys and gals,

Around a year ago I released and posted about Persona Engine as a fun side project, trying to get the whole ASR -> LLM -> TTS pipeline going fully locally while having a realtime avatar that is lip-synced (think VTuber). I was able to achieve this and was super happy with the result, but the TTS for me was definitely lacking, since I was using Sesame at the time as reference. After that I took a long break.

A week or two ago, I thought to give the project a refresh, and also wanted to see how far we have come with local models, and boy was I pleasantly surprised with Qwen3 TTS. During my initial tests it was lacking, especially the version published by the Qwen team themselves, but after digging around and experimenting a lot I was able to:

1. Make streaming with the model work reliably. The architecture of the model is perfect for this, since the decoder uses a sliding window, which means if you stream the LLM response, that's completely fine and the TTS will keep coherent prosody, pitch, and intonation.
2. Get the model working with llama.cpp, because I am using C# and speed is important, so also quantized it.
3. The model was lacking word-level timings and phonemes which Kokoro (the previous, more robotic sounding TTS) had. So I had to implement CTC word-level alignment to be able to know when certain words are spoken (important for subtitles + getting phonemes to have the lips move correctly).

Once this was all done, I also decided to finetune my own Qwen3-TTS voice. The cloning capabilities are really cool, but very lacking in contextual understanding and struggles with pronouncing. Additionally, the custom trained voices provided by the Qwen team didn't have any female native speakers, and I didn't want to create a new Live2D model.

In the end, the finetune blew me away and will probably continue improving it.

GitHub is here: [https://github.com/fagenorn/handcrafted-persona-engine](https://github.com/fagenorn/handcrafted-persona-engine)

Check it out, have fun, and let me know whatever crazy stuff you decide to do with it.

Open Reddit thread

r/LocalLLaMA 6 upvotes 12 comments April 7, 2026

Whats the best open source/free TTS

Hey, Im trying to see how much does synthetic data help with training ASR model. What is the best TTS? Im looking for something that sounds natural and not robotic. It would be really nice if the TTS could mimic english accents (american, british, french etc.). Thanks for the help.

Open Reddit thread

r/SillyTavernAI 7 upvotes 34 comments February 28, 2026

What do you use for TTS?

I've tried several ways but not feeling satisfied:

1- chatterbox: too slow

2- Alltalk: never worked

3- system: bad quality

4- Kokoro: currently using but not impressed

\- what TTS way do you recommend?

\- If you mention elevenLab, is the price worth it? i did the calculation and it's 30 min per 5 dollar.

\- Edge, I know it's a privacy nightmare but is it worth it? I use openrouter anyway

\- I heard about Kitten TTS, and GPT-SoVITS v3 but nobody showed tutorial on how to use them on sillytavern

\- should I just wait for open router to give reasonable priced TTS API?

Open Reddit thread

r/LocalLLaMA 3 upvotes 9 comments April 25, 2026

Just for person who is in search for a best tts model to run . (Allowed for commercial use)

If you have low vram - qwen 3 tts is good

If you need something unique go for - tada 3b but it need 28gb vram

If you want best tts rn + have the commercial use allowed then go for - moss tts 8b its literally the best model out there

Literally voice clone is sooooooo powerful 😍

(Dont go for fish audio its not for commercial use but for fun its veryyyy good)

Edit: i found longcat DiT 3.5b its totally mind boggling. It is even better than MOSS 8b. And best at cloning voices

Open Reddit thread

View more discussions →

FAQ

Common questions about TTS

What is the maximum input length for tts-1?

The model supports a context window of 4096 tokens per request, which corresponds to the maximum amount of text that can be converted to speech in a single API call.

How is tts-1 priced?

OpenAI prices tts-1 based on the number of characters in the input text. Current pricing details are available on the OpenAI pricing page at platform.openai.com/pricing.

What voices are available with tts-1?

tts-1 supports six built-in voices: alloy, echo, fable, onyx, nova, and shimmer. Each voice has a distinct tone and style, but no custom voice cloning is supported natively through this model.

What audio formats does tts-1 output?

The model can output audio in MP3, Opus, AAC, and FLAC formats. MP3 is the default format returned by the API.

What is the difference between tts-1 and tts-1-hd?

tts-1 is optimized for low latency and faster audio delivery, while tts-1-hd trades some speed for higher audio quality. Both models share the same voices and input format.

What is the training data cutoff for tts-1?

According to the provided metadata, the model's training date is listed as November 2024.

More models from OpenAI

Continue browsing adjacent models from the same provider.

← All AI Models