Audio Transcription
Converts spoken audio from audio and video files into written text transcripts. Accessible via the ElevenLabs API for use in automated pipelines.
Scribe v1 is ElevenLabs' original speech-to-text model, designed to convert spoken audio into written transcripts. Built as the foundation of ElevenLabs' transcription offering, it enables developers and creators to automatically transcribe audio and video content through the ElevenLabs API. The model supports transcription across multiple languages, making it usable in multilingual workflows and automation pipelines. Scribe v1 has been deployed in use cases ranging from voice note capture to content production tooling. It has since been succeeded by Scribe v2, which adds features such as support for 90+ languages, speaker diarization for up to 32 speakers, word-level timestamps, and entity detection. Developers starting new projects are directed by ElevenLabs to use Scribe v2, while Scribe v1 remains available for existing integrations.
High-signal model metadata in a structured two-column overview table.
The entity that provides this model.
The number of tokens supported by the input context window.
The number of tokens that can be generated by the model in a single request.
Whether the model's code is available for public use.
When the model was first released.
When the model's knowledge was last updated.
The providers that offer this model. This is not an exhaustive list.
Types of data this model can process.
A fuller summary of positioning, capabilities, and source-specific details for Scribe v1.
Scribe v1 is ElevenLabs' original speech-to-text model, designed to convert spoken audio into written transcripts. Built as the foundation of ElevenLabs' transcription offering, it enables developers and creators to automatically transcribe audio and video content through the ElevenLabs API. The model supports transcription across multiple languages, making it usable in multilingual workflows and automation pipelines.
Scribe v1 has been deployed in use cases ranging from voice note capture to content production tooling. It has since been succeeded by Scribe v2, which adds features such as support for 90+ languages, speaker diarization for up to 32 speakers, word-level timestamps, and entity detection. Developers starting new projects are directed by ElevenLabs to use Scribe v2, while Scribe v1 remains available for existing integrations.
Converts spoken audio from audio and video files into written text transcripts. Accessible via the ElevenLabs API for use in automated pipelines.
Transcribes speech across a range of languages, enabling use in multilingual content workflows.
Available through the ElevenLabs API, allowing integration into developer workflows, automation pipelines, and third-party applications.
Returns transcription results as structured text output suitable for downstream processing, storage, or display.
Primary API pricing shown in the same “quick compare” spirit as the reference page.
Places where this model is available, based on the synced detail-page metadata.
The configurable options currently documented for this model.
Choose whether to include timing and speaker information in the transcription
Parameters currently listed by OpenRouter or the local catalog for this model.
Official model cards, release notes, docs, and other references synced from the source page.
Scribe v1 discussions are most active in r/ElevenLabs, r/LocalLLaMA, r/macapps. Top Reddit threads cluster around benchmark and model-comparison threads.
The strongest match in this snapshot has 79 upvotes and 25 comments.
I shipped **Rokid-Scribe v1.1.1**, a voice note workflow for Rokid glasses.
The main reason I built it is simple: I didn’t really like the AI transcription flow in the **Hi Rokid** app. In my testing, it felt a bit slow (for long audio recording like 10+ minutes), and sometimes the transcripts were not that accurate + the summary function didn't work properly. So instead of being locked into one built-in transcription path, I went for a **custom multi-provider approach**.
With Rokid-Scribe, the flow is:
* Record on the glasses
* Import to the phone over local transport (Wifi or Bluetooth)
* Transcribe on the phone with the provider you want
* Keep everything local on the phone
* Export to `.txt` or `.pdf` or copy it on your clipboard
Current supported providers:
* **ElevenLabs**
* **AssemblyAI**
* **Speechmatics**
* **Deepgram**
* **Groq**
A nice side effect of the multi-provider setup is that you can pick what matters most to you:
* accuracy
* speed
* language support
* diarization / multi-speaker support
* free tier / cost
# Free tier snapshot
Checked on **April 13, 2026**. These can change, so always verify the official pricing pages.
**ElevenLabs**: free plan with **10k credits/month**, and STT is included. Their pricing page currently shows that this is roughly **13h38 of Scribe v2 transcription** on the free plan.
[https://elevenlabs.io/pricing/api?price.section=speech\_to\_text](https://elevenlabs.io/pricing/api?price.section=speech_to_text)
**AssemblyAI**: free tier currently advertised as **up to 333 hours** of transcription.
[https://www.assemblyai.com/pricing](https://www.assemblyai.com/pricing)
**Speechmatics**: free tier currently includes **480 minutes/month** of speech-to-text.
[https://www.speechmatics.com/pricing](https://www.speechmatics.com/pricing)
**Deepgram**: free signup currently includes **$200 in credits**, no credit card required.
[https://deepgram.com/pricing](https://deepgram.com/pricing)
**Groq**: free plan exists, but it’s more **rate-limit based** than a simple monthly credit bucket. For STT, the docs currently mention **25 MB max upload on free tier**.
[https://console.groq.com/docs/speech-to-text](https://console.groq.com/docs/speech-to-text) [https://console.groq.com/docs/rate-limits](https://console.groq.com/docs/rate-limits)
If you want something more flexible than the default transcription flow then this app is made for you.
I’d love feedback for the app and also if you need to add more/new providers let me know what works the best for you !
Repo: [https://github.com/Anezium/Rokid-Scribe](https://github.com/Anezium/Rokid-Scribe)
Release: [https://github.com/Anezium/Rokid-Scribe/releases/tag/v1.1.1](https://github.com/Anezium/Rokid-Scribe/releases/tag/v1.1.1)
Hey everyone — big day for SayScribe.
The Mac app is officially live on the Mac App Store. Same app you know from iPhone and iPad, now native on macOS with full iCloud sync between all your devices.
Universal Purchase — buy once, use everywhere on your Apple devices.
* Mac App Store/iPhone/iPad: [https://apps.apple.com/app/id6759438198](https://apps.apple.com/app/id6759438198)
Huge thanks to everyone who tested the Mac build. Feedback alwayswelcome here — post bugs, feature requests, anything.
Is it just me or were logprobs and probability fields never fixed? I get 0 for all logprobs and None for all probability.
Has anyone found a fix? Am I looking into the wrong place?
I'm getting total gibberish from a test audio segment that I've used a bunch with only minor transcription errors is suddenly returning gibberish!! Anyone else having issues? Language is Urdu to English.
Hello
When I run transcriptions using Scribe v1 it seems like each token's logprob defaults to 0.0. I never get any value different than this, even for hallucinated transcriptions on low quality audios. My aim is to use these logprobs to compute some kind of a confidence level.
Are logprobs not available for Scribe v1 or am I doing something wrong?
Scribe v1 is used to transcribe spoken audio from audio and video files into written text. It has been used in workflows such as voice note capture, content production, and automated transcription pipelines via the ElevenLabs API.
Yes, Scribe v1 supports transcription across multiple languages, making it suitable for multilingual workflows. However, its successor Scribe v2 expands this to 90+ languages.
No context window size is specified in the available metadata for Scribe v1, as it is a speech-to-text transcription model rather than a language model.
Yes. ElevenLabs has released Scribe v2, which adds speaker diarization for up to 32 speakers, support for 90+ languages, word-level timestamps, keyterm prompting, and entity detection. ElevenLabs recommends Scribe v2 for new applications.
Scribe v1 is accessible via the ElevenLabs API. It can be integrated into developer workflows and automation pipelines for audio and video transcription tasks.
Continue browsing adjacent models from the same provider.