Text to Speech
Converts written text into spoken audio output. Accepts up to 4096 tokens of input text per request.
TTS HD (model ID: tts-1-hd) is a text-to-speech model developed by OpenAI that converts written text into natural-sounding spoken audio. It accepts a text input of up to 4096 tokens and produces audio output in a variety of supported voices. TTS-1-HD is the quality-optimized variant in OpenAI's TTS model family, designed to produce higher-fidelity audio compared to the standard TTS-1 offering. The model is well-suited for applications that require clear, natural-sounding voice output, such as voice assistants, audiobook narration, accessibility tools, and content creation workflows. It supports multiple built-in voices and can output audio in formats including MP3, Opus, AAC, and FLAC. Developers access the model through OpenAI's API, and it is available on MindStudio without requiring separate API key management.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for TTS HD.
TTS HD (model ID: tts-1-hd) is a text-to-speech model developed by OpenAI that converts written text into natural-sounding spoken audio. It accepts a text input of up to 4096 tokens and produces audio output in a variety of supported voices. TTS-1-HD is the quality-optimized variant in OpenAI's TTS model family, designed to produce higher-fidelity audio compared to the standard TTS-1 offering.
The model is well-suited for applications that require clear, natural-sounding voice output, such as voice assistants, audiobook narration, accessibility tools, and content creation workflows. It supports multiple built-in voices and can output audio in formats including MP3, Opus, AAC, and FLAC. Developers access the model through OpenAI's API, and it is available on MindStudio without requiring separate API key management.
Converts written text into spoken audio output. Accepts up to 4096 tokens of input text per request.
Supports a selection of built-in voices (e.g., alloy, echo, fable, onyx, nova, shimmer) to vary the tone and style of generated speech.
Outputs audio in multiple formats including MP3, Opus, AAC, and FLAC to suit different playback and storage requirements.
The HD variant applies additional processing to produce higher-fidelity audio compared to the standard TTS-1 model, reducing artifacts in the output.
Accessible via OpenAI's REST API, allowing developers to integrate speech synthesis directly into applications and pipelines.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Places where this model is available, based on the synced detail-page metadata.
The configurable options currently documented for this model.
Voice to use in TTS
Parameters currently listed by OpenRouter or the local catalog for this model.
Official model cards, release notes, docs, and other references synced from the source page.
TTS HD discussions are most active in r/ChatGPT, r/ChatGPTPro, r/OpenWebUI. Top Reddit threads cluster around benchmark and model-comparison threads. The strongest match in this snapshot has 79 upvotes and 56 comments.
I generated the same text using both, and I can't tell the difference. Can you?
I wish I could attach my files in here :( I guess I can upload a YouTube video but I'm too lazy for that... Perhaps if I get a comment asking for it, I'll take the time to do it.
response = client.audio.speech.create(
model="tts-1", # vs tts-1-hd
voice="nova",
input=text
)
I do really like OpenAI’s text to speech hd model, it sounds great in many languages I tried.
However, I need to customize the voice for my project. Is there any good options?
TTS HD supports a context window of 4096 tokens per request, which corresponds to the maximum amount of text that can be converted to speech in a single API call.
TTS-1-HD is the quality-optimized variant of OpenAI's text-to-speech model family. It is designed to produce higher-fidelity audio output, while TTS-1 is optimized for lower latency at the cost of some audio quality.
TTS HD can output audio in MP3, Opus, AAC, and FLAC formats, as documented in OpenAI's text-to-speech guide.
OpenAI provides six built-in voices for TTS HD: alloy, echo, fable, onyx, nova, and shimmer. Each voice has a distinct tone and character.
TTS HD is a speech synthesis model and does not rely on a training knowledge cutoff in the same way language models do. The metadata lists the training date as not applicable.
Pricing for TTS HD is set by OpenAI and is based on the number of characters processed. Refer to OpenAI's official pricing page for current rates, as pricing may change over time.
Continue browsing adjacent models from the same provider.