Skip to main content
The core functionality of the Gladia API is its Speech Recognition model, designed to convert spoken language into written text. Additional capabilities like diarization, summarization, translation, custom prompts and more can be enabled by adding parameters to your request.

Speaker Diarization

Detect speakers and understand who said what, and when.

PII Redaction

Automatically redact names, emails, vehicle IDs, and other PII in pre-recorded transcripts.

Export subtitles (SRT/VTT)

Generate ready-to-use subtitle files in SRT or VTT formats.

Custom vocabulary

Boost recognition accuracy for brand, product, and domain terms.

Custom spelling

Normalize how specific words, brands, and names are spelled.

Enhanced punctuation

Improved punctuation and casing for cleaner, easier-to-read transcripts.

Sentences

Group words into sentences with timing for better readability and parsing.

Name consistency

Enforce consistent rendering of speaker and entity names.

Dual or multiple channels

Transcribe stereo or multi-channel audio with channel-aware processing.

Speaker Diarization

Detect speakers and understand who said what, and when.

Export subtitles (SRT/VTT)

Generate ready-to-use subtitle files in SRT or VTT formats.

Custom vocabulary

Boost recognition accuracy for brand, product, and domain terms.

Custom spelling

Normalize how specific words, brands, and names are spelled.

Custom metadata

Attach and propagate metadata to organize and trace your jobs.
Want to know more about the audio intelligence features? Check out our Audio Intelligence chapter.