ElevenLabs

Scribe v2

Scribe v2 is ElevenLabs' flagship speech-to-text model, built to transcribe audio accurately across more than 90 languages with automatic language detection. It supports speaker diarization for up to 32 speakers, word-level timestamps, and entity detection across 56 named entity types, making it one of the more feature-rich transcription models available through an API. Developers can also supply up to 100 custom keyterms to improve recognition of domain-specific vocabulary, names, or technical jargon. Scribe v2 is well suited for applications where transcription accuracy and rich metadata matter — such as meeting summarization, podcast indexing, media subtitling, and legal or medical documentation workflows. Its dynamic audio tagging feature automatically labels non-speech events, which adds context beyond spoken words. The combination of precise timing data and speaker attribution makes it a practical choice for any pipeline where knowing who said what and when is a requirement.

Unknown N/A context N/A output

Multilingual Transcription Speaker Diarization Word-Level Timestamps Entity Detection Keyterm Prompting Audio Event Tagging

Overview ↓ About ↓ Capabilities ↓ Pricing ↓ Parameters ↓ Resources ↓ Community ↓ FAQ ↓

Model Overview

High-signal model metadata in a structured two-column overview table.

Provider

The entity that provides this model.

ElevenLabs

Input Context Window

The number of tokens supported by the input context window.

N/A tokens

Maximum Output Tokens

The number of tokens that can be generated by the model in a single request.

N/A tokens

Open Source

Whether the model's code is available for public use.

Release Date

When the model was first released.

Unknown

Knowledge Cut-off Date

When the model's knowledge was last updated.

Unknown

API Providers

The providers that offer this model. This is not an exhaustive list.

ElevenLabs

Modalities

Types of data this model can process.

Text Audio

What is Scribe v2

A fuller summary of positioning, capabilities, and source-specific details for Scribe v2.

Scribe v2 is ElevenLabs' flagship speech-to-text model, built to transcribe audio accurately across more than 90 languages with automatic language detection. It supports speaker diarization for up to 32 speakers, word-level timestamps, and entity detection across 56 named entity types, making it one of the more feature-rich transcription models available through an API. Developers can also supply up to 100 custom keyterms to improve recognition of domain-specific vocabulary, names, or technical jargon.

Scribe v2 is well suited for applications where transcription accuracy and rich metadata matter — such as meeting summarization, podcast indexing, media subtitling, and legal or medical documentation workflows. Its dynamic audio tagging feature automatically labels non-speech events, which adds context beyond spoken words. The combination of precise timing data and speaker attribution makes it a practical choice for any pipeline where knowing who said what and when is a requirement.

Capabilities

What Scribe v2 supports

Multilingual Transcription

Transcribes spoken audio in over 90 languages with automatic language detection, requiring no manual language configuration.

Speaker Diarization

Identifies and separates individual speakers within a single audio file, supporting up to 32 distinct speakers.

Word-Level Timestamps

Provides precise timing for every transcribed word, enabling accurate alignment with audio or video content.

Entity Detection

Automatically identifies and labels named entities within transcriptions, covering up to 56 entity types.

Keyterm Prompting

Accepts up to 100 custom keyterms to guide the model toward accurate recognition of domain-specific vocabulary or proper nouns.

AUD

Audio Event Tagging

Detects and labels non-speech audio events dynamically, adding contextual metadata beyond spoken words.

Pricing for Scribe v2

Primary API pricing shown in the same “quick compare” spirit as the reference page.

Input tokens N/A Per million tokens

Output tokens N/A Per million tokens

API Access & Providers

Places where this model is available, based on the synced detail-page metadata.

ElevenLabs

Configuration & Parameters

The configurable options currently documented for this model.

Include Speakers

Select

Choose whether to include timing and speaker information in the transcription

Default: no

Yes No

Supported Request Parameters

Parameters currently listed by OpenRouter or the local catalog for this model.

Include Speakers

Resources & Documentation

Official model cards, release notes, docs, and other references synced from the source page.

Official Documentation Documentation

→

Speech to Text Capability Guide Documentation

→

ElevenLabs API Reference Documentation

→

Scribe v2 Launch Announcement Announcements

→

Community discussion

What people think about Scribe v2

Scribe v2 discussions are most active in r/machinedpens, r/ElevenLabs, r/EDCexchange. Top Reddit threads cluster around benchmark and model-comparison threads.

The strongest match in this snapshot has 165 upvotes and 60 comments.

r/machinedpens 9 upvotes 10 comments April 27, 2026

WTS - MW ROTA V2 , Clickshifts, Arno Bernard Assegai bolt, USG TiScribe

Hi All, I've got a few pens for sale. All are in good condition outside the Tiscribe. All Prices include USPS insured shipping. Paypal FF or Zelle. Can provide plenty of references. If you have any questions or need more pics just let me know.

Open to trades for any Parker or G2 - CM, Mog, Full Zirc Q, Full Zirc NTI digital

Timestamp - [https://imgur.com/a/timestamp-LZCPnK6](https://imgur.com/a/timestamp-LZCPnK6)

Arno Bernard Assegai Bolt w/ CF Cap - Bought both at bladeshow TX 2026 - Comes with OG cap and box/goodies. - $190 SOLD- [https://imgur.com/a/LiPyBNM](https://imgur.com/a/LiPyBNM)

Machinewise ROTA V2 - OG owner, Comes with card and box. G2 refil - $300- SOLD - [https://imgur.com/a/azeMYqa](https://imgur.com/a/azeMYqa)

Magnus Clickshift Hex Lemon- I believe this is a V3 - Comes with silver Magnus Box and card - $210 - [https://imgur.com/a/clickshift-hex-v3-UiXJcm8](https://imgur.com/a/clickshift-hex-v3-UiXJcm8)

Magnus Clickshift Geo Dark Blue - Skelly tip - Comes with OG magnus box and card. $225 - [https://imgur.com/a/geo-clickshift-v4-dark-blue-HdT1BzA](https://imgur.com/a/geo-clickshift-v4-dark-blue-HdT1BzA)

Magnus Clickshift Skelly V3 Dark Blue - Comes with OG box and card - $210 SOLD - [https://imgur.com/a/clickshift-skelly-v3-kOp8yPD](https://imgur.com/a/clickshift-skelly-v3-kOp8yPD)

Magnus Clickshift Spiral V4 - Comes with OG box and Card - $225 - [https://imgur.com/a/clickshift-spiral-v4-ITMZkJd](https://imgur.com/a/clickshift-spiral-v4-ITMZkJd)

USG TiScribe V2 - nth owner, pretty beat. Comes with USG tin - $200- SOLD - [https://imgur.com/a/hijVwmZ](https://imgur.com/a/hijVwmZ)

Open Reddit thread

r/spokenly 1 upvotes 9 comments March 11, 2026

Scribe V2 Realtime?

Any chance we can get a support for this API? I feel like it was made for dictation and there shouldn't be a caching of the audio file anymore before we see words appearing in the text field. This could transform the experience with the app.

Open Reddit thread

r/LanguageTechnology 2 upvotes 6 comments March 12, 2026

Scribe v2 seems the best STT model so far

I tested it against the Norwegian word "avslutt" which means "exit" and so far it's the only model that somewhat understands what I say consistently..

https://preview.redd.it/e4ur915gyjog1.png?width=971&format=png&auto=webp&s=6a3025a04418c9a2200e76f6afb0d0e0e0a15a9f

Open Reddit thread

r/machinedpens 10 upvotes 11 comments January 7, 2026

WTS USG TiScribe V2 Ti

WTS Full size USG TiScribe V2 Ti $200

More pictures available upon request. Has seen some pocket time and has a few snail trails.
Over 30+ trades on r/knifeswap would prefer PayPal FF, 3% up charge for G&S, shipping included in price, US only please. Thanks for looking.

Open Reddit thread

r/ElevenLabs 21 upvotes 9 comments January 9, 2026

Introducing Scribe v2

We’ve just launched **Scribe v2**, our most accurate transcription model to date, built for **batch transcription, subtitles, and captioning at scale**.

While Scribe v2 Realtime is optimized for ultra-low latency and agent use cases, **Scribe v2** focuses on high-accuracy transcription workflows across large audio and video libraries.

# What’s New in Scribe v2

* Lowest word error rate based on industry benchmarks
* Stable handling of pauses, tone changes, and long silences
* Accurate transcription across 90+ languages

Scribe v2 is now available in **ElevenLabs Studio** for subtitles, captions, and transcriptions used in marketing, media, research, training, and compliance workflows.

Additional features include:

* Keyterm prompting using contextual understanding
* Entity detection for PII, health data, and payment details with exact timestamps
* Automatic multi-language detection
* Speaker diarization, word-level timestamps, and dynamic audio tagging
* Enterprise-ready compliance, including SOC 2, ISO27001, PCI DSS L1, HIPAA, GDPR, EU and India data residency, and zero-retention mode

Try Scribe v2 here: [https://elevenlabs.io/app/speech-to-text](https://elevenlabs.io/app/speech-to-text)
Read the docs: [https://elevenlabs.io/docs/capabilities/speech-to-text](https://elevenlabs.io/docs/capabilities/speech-to-text)

# Get 1,000 Free ElevenLabs Credits

To celebrate the launch, we’re rewarding community members with **1,000 free ElevenLabs credits** for helping share the announcement on X.

# How it works:

1. Join Discord: [https://discord.gg/tfBp8h3pHT](https://discord.gg/tfBp8h3pHT)
2. Connect your X account in #connect-X
3. Retweet the Scribe v2 announcement in #giveaway
4. Receive a unique 1,000-credit link
5. Click the link to apply the credits to your account

This is part of a new system to micro-reward creators who help amplify product launches.

Happy transcribing — and enjoy the credits!

Open Reddit thread

View more discussions →

FAQ

Common questions about Scribe v2

How many languages does Scribe v2 support?

Scribe v2 supports transcription in over 90 languages and can automatically detect the spoken language without requiring manual configuration.

Does Scribe v2 have a context window limit?

No context window is specified in the available metadata for Scribe v2, as it is a speech-to-text model rather than a text-based language model. Limits, if any, would apply to audio file length or size as defined by the ElevenLabs API.

How many speakers can Scribe v2 distinguish in a single file?

Scribe v2's speaker diarization feature can identify and separate up to 32 individual speakers within a single audio file.

Can I improve recognition of specialized terminology?

Yes. Scribe v2 supports keyterm prompting, which allows you to supply up to 100 custom terms — such as product names, technical jargon, or proper nouns — to guide the model toward more accurate recognition.

What types of named entities can Scribe v2 detect?

Scribe v2 can automatically identify and label up to 56 types of named entities within a transcription, such as people, organizations, and locations.

More models from ElevenLabs

Continue browsing adjacent models from the same provider.

← All AI Models