LLM Model Directory

Explore frontier AI models by provider, pricing, and context

Browse the synced model catalog by provider, release, pricing, and core capabilities.

All Text 132 Vision 13 Image 55 Video 32 Transcription 5 Text to Speech 6 Music 2 Lip Sync 5 3D 5

5 models 2 providers in view Current filter: All providers Type: Transcription

OpenAI

3 models

›

Transcription

GPT-4o mini Transcribe

Release date unavailable

GPT-4o mini Transcribe is a speech-to-text model developed by OpenAI that uses the GPT-4o mini architecture to convert spoken audio into written text. It is designed to deliver improved word error rates and more accurate language recognition compared to the original Whisper-based transcription models. The model is part of OpenAI's transcription API offerings and became available in 2025. This model is well-suited for applications that require accurate transcripts from audio input, such as meeting notes, voice interfaces, and content captioning. Its use of the GPT-4o mini backbone allows it to handle a range of languages with improved recognition accuracy. Developers looking for a cost-efficient transcription option within the OpenAI ecosystem can use this model via the API.

Context: 16,000 Output: 2,000

Input: $1.25 Output: $5.00

View model →

›

Transcription

GPT-4o Transcribe

Release date unavailable

GPT-4o Transcribe is a speech-to-text model developed by OpenAI that uses the GPT-4o model architecture to convert spoken audio into written text. It is part of OpenAI's audio model lineup and was introduced as an improvement over the original Whisper-based transcription models, offering a lower word error rate and more accurate language recognition across a broader range of languages. The model is designed for use cases where transcription accuracy is a priority, such as meeting notes, voice interfaces, medical dictation, and multilingual content. Because it builds on GPT-4o rather than the earlier Whisper architecture, it brings stronger language understanding to the transcription task, which can help with difficult audio conditions, accented speech, and domain-specific vocabulary.

Context: 16,000 Output: 2,000

Input: $2.50 Output: $10.00

View model →

›

Transcription

Whisper

Release date unavailable

Whisper is a general-purpose speech recognition model developed by OpenAI and made available via the OpenAI API under the model ID whisper-1. It was trained on a large dataset of diverse audio, enabling it to handle a wide range of accents, background noise conditions, and technical vocabulary. What distinguishes Whisper is its multitask design: it can perform not only speech-to-text transcription but also speech translation into English and automatic language identification within a single model. Whisper is well suited for developers building transcription pipelines, subtitle generation tools, voice interfaces, or any application that requires converting spoken audio into structured text. It supports multilingual input, making it useful for global applications where audio may arrive in different languages. The model accepts common audio formats and returns transcriptions or translations as plain text or with optional timestamps.

Context: N/A Output: N/A

Input: $0.01 Output: N/A

View model →

ElevenLabs

2 models

›

Transcription

Scribe v1

Release date unavailable

Scribe v1 is ElevenLabs' original speech-to-text model, designed to convert spoken audio into written transcripts. Built as the foundation of ElevenLabs' transcription offering, it enables developers and creators to automatically transcribe audio and video content through the ElevenLabs API. The model supports transcription across multiple languages, making it usable in multilingual workflows and automation pipelines. Scribe v1 has been deployed in use cases ranging from voice note capture to content production tooling. It has since been succeeded by Scribe v2, which adds features such as support for 90+ languages, speaker diarization for up to 32 speakers, word-level timestamps, and entity detection. Developers starting new projects are directed by ElevenLabs to use Scribe v2, while Scribe v1 remains available for existing integrations.

Context: N/A Output: N/A

Input: N/A Output: N/A

View model →

›

Transcription

Scribe v2

Release date unavailable

Scribe v2 is ElevenLabs' flagship speech-to-text model, built to transcribe audio accurately across more than 90 languages with automatic language detection. It supports speaker diarization for up to 32 speakers, word-level timestamps, and entity detection across 56 named entity types, making it one of the more feature-rich transcription models available through an API. Developers can also supply up to 100 custom keyterms to improve recognition of domain-specific vocabulary, names, or technical jargon. Scribe v2 is well suited for applications where transcription accuracy and rich metadata matter — such as meeting summarization, podcast indexing, media subtitling, and legal or medical documentation workflows. Its dynamic audio tagging feature automatically labels non-speech events, which adds context beyond spoken words. The combination of precise timing data and speaker attribution makes it a practical choice for any pipeline where knowing who said what and when is a requirement.

Context: N/A Output: N/A

Input: N/A Output: N/A

View model →