Custom vocabulary

Pre-recorded Live As Speech-to-text models are trained on general vocabulary, under-represented words such as brand names, proper nouns, or domain-specific terms are often transcribed incorrectly. Custom Vocabulary is a post-processing operation that compares phonemes between the transcript and your pronunciations entries. When the phonetic match is close enough, the transcribed text is replaced with your term.

If you already know which text variants the model produces and only need to normalize spelling, use Custom spelling instead. Custom spelling is not based on phonemes but literal matching.

How it works

Custom vocabulary operates at a text level and is based on phoneme similarity. Once the transcription is generated, Gladia converts both the transcribed words and your vocabulary entries into phonemes, then compares them. The intensity controls how aggressively the model applies replacements: a higher intensity means the model will replace words more readily (wider phoneme matching), while a lower intensity requires a closer phoneme match before a replacement is made. The pronunciations field lets you provide plain-text alternative spellings that reflect how the word actually sounds in speech. These are not phonetic notation. Just write the word the way someone might naively spell it based on how it sounds. Gladia converts these strings to phonemes internally. For example, if your term is “Nietzsche”, you might add ["Niche", "Neechee"] as pronunciations. This widens the phoneme net without having to raise the intensity (which would increase false positives across the board).

When to use custom vocabulary vs. custom spelling

Use Custom spelling when the model outputs a recognizable but wrong form. It applies literal string matching on variants you list (e.g. “data-science” → “Data Science”). List every close variant the model might output. Use Custom vocabulary when the model outputs garbled or sound-alike text. It applies phoneme-based matching on entries you define (e.g. “le vin” / “levine” → “Levain”). Add pronunciations for each spelling the model might produce.

	Custom spelling	Custom vocabulary
Matches on	Exact text in the transcript	How words sound
Best for	Wrong spelling, punctuation, formatting	Phonetically similar mis-transcriptions
You provide	All the words that the model outputs wrongly	`value`, `pronunciations`, `intensity`

Rule of thumb: start with a transcription run without any custom vocabulary. Look at what the output actually says. If the word appears but is just misspelled, custom spelling is the simpler and safer fix. If the word is completely garbled, that’s when custom vocabulary is the right tool.

Example configuration

{
  "audio_url": "YOUR_AUDIO_URL",
  "custom_vocabulary": true,
  "custom_vocabulary_config": {
    "vocabulary": [
      "Gladia",
      {"value": "Solaria"},
      {
        "value": "Salesforce",
        "pronunciations": ["sell force", "sale forces"],
        "intensity": 0.5,
        "language": "en"
      },
    ],
    "default_intensity": 0.4
  }
}

Parameter reference

vocabulary

object | string[]

Show properties

value

string

required

The correct word you want to be transcribed.

pronunciations

string[]

Words with different spellings the word might be mis-spelled or mis-transcribed.

intensity

number

Per-entry intensity, we suggest 0.4–0.6 as value. Inherits default_intensity when omitted.

language

string

Language used for phoneme comparison (defaults to the transcription language). Set this when a term is pronounced in a different language than the rest of the audio.

default_intensity

number

Global intensity for entries. We suggest 0.4–0.6 raise if terms are missed, lower if unrelated words get replaced.

Tuning tips

Start at default_intensity 0.4 and adjust per entry only when needed.
Add pronunciations before raising intensity — variants narrow what can match without loosening every comparison.
Keep lists focused — every transcribed word is compared against every entry; long lists increase false positives.
Move stable misspellings to custom spelling when the model already outputs a recognizable (but wrong) form.

Recommended workflow

Transcribe without custom vocabulary and note mis-transcribed terms.
Route each term: garbled or phonetically wrong output → custom vocabulary; recognizable but misspelled → custom spelling.
Add entries with pronunciations and default_intensity around 0.4–0.6.
Transcribe again — confirm targets appear and scan for false positives.
Refine: lower intensity, tighten pronunciations, or move stubborn terms to custom spelling.

Documentation Index

​How it works

​When to use custom vocabulary vs. custom spelling

​Example configuration

​Parameter reference

​Tuning tips

​Recommended workflow

How it works

When to use custom vocabulary vs. custom spelling

Example configuration

Parameter reference

Tuning tips

Recommended workflow