Pre-recorded Live As Speech-to-text models are trained on general vocabulary, under-represented words such as brand names, proper nouns, or domain-specific terms are often transcribed incorrectly. Custom Vocabulary is a post-processing operation that compares phonemes between the transcript and your pronunciations entries. When the phonetic match is close enough, the transcribed text is replaced with your term.Documentation Index
Fetch the complete documentation index at: https://docs.gladia.io/llms.txt
Use this file to discover all available pages before exploring further.
If you already know which text variants the model produces and only need to
normalize spelling, use Custom
spelling instead. Custom spelling is not based on phonemes but literal matching.
How it works
Custom vocabulary operates at a text level and is based on phoneme similarity. Once the transcription is generated, Gladia converts both the transcribed words and your vocabulary entries into phonemes, then compares them. Theintensity controls how aggressively the model applies replacements: a higher intensity means the model will replace words more readily (wider phoneme matching), while a lower intensity requires a closer phoneme match before a replacement is made.
The pronunciations field lets you provide plain-text alternative spellings that reflect how the word actually sounds in speech. These are not phonetic notation. Just write the word the way someone might naively spell it based on how it sounds. Gladia converts these strings to phonemes internally. For example, if your term is “Nietzsche”, you might add ["Niche", "Neechee"] as pronunciations. This widens the phoneme net without having to raise the intensity (which would increase false positives across the board).
When to use custom vocabulary vs. custom spelling
Use Custom spelling when the model outputs a recognizable but wrong form. It applies literal string matching on variants you list (e.g. “data-science” → “Data Science”). List every close variant the model might output. Use Custom vocabulary when the model outputs garbled or sound-alike text. It applies phoneme-based matching on entries you define (e.g. “le vin” / “levine” → “Levain”). Add pronunciations for each spelling the model might produce.| Custom spelling | Custom vocabulary | |
|---|---|---|
| Matches on | Exact text in the transcript | How words sound |
| Best for | Wrong spelling, punctuation, formatting | Phonetically similar mis-transcriptions |
| You provide | All the words that the model outputs wrongly | value, pronunciations, intensity |
Example configuration
Parameter reference
Global intensity for entries. We suggest 0.4–0.6 raise if terms are missed, lower if unrelated words get replaced.
Tuning tips
- Start at
default_intensity0.4 and adjust per entry only when needed. - Add
pronunciationsbefore raisingintensity— variants narrow what can match without loosening every comparison. - Keep lists focused — every transcribed word is compared against every entry; long lists increase false positives.
- Move stable misspellings to custom spelling when the model already outputs a recognizable (but wrong) form.
Recommended workflow
- Transcribe without custom vocabulary and note mis-transcribed terms.
- Route each term: garbled or phonetically wrong output → custom vocabulary; recognizable but misspelled → custom spelling.
- Add entries with
pronunciationsanddefault_intensityaround 0.4–0.6. - Transcribe again — confirm targets appear and scan for false positives.
- Refine: lower
intensity, tightenpronunciations, or move stubborn terms to custom spelling.