> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gladia.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Custom vocabulary

> Improve recognition of domain-specific words and phrases

<Badge color="blue" size="lg" icon="file-audio">
  Pre-recorded
</Badge>

<Badge color="green" icon="tower-broadcast" size="lg">
  Live
</Badge>

As Speech-to-text models are trained on general vocabulary, under-represented words such as brand names, proper nouns, or domain-specific terms are often transcribed incorrectly.

Custom Vocabulary is a post-processing operation that compares **phonemes** between the transcript and your pronunciations entries. When the phonetic match is close enough, the transcribed text is replaced with your term.

<Note>
  If you already know which *text* variants the model produces and only need to
  normalize spelling, use **[Custom
  spelling](/chapters/audio-intelligence/custom-spelling)** instead. Custom spelling is not based on phonemes but literal matching.
</Note>

## How it works

Custom vocabulary operates at a **text level** and is based on **phoneme similarity**.

Once the transcription is generated, Gladia converts both the transcribed words and your vocabulary entries into phonemes, then compares them. The `intensity` controls how aggressively the model applies replacements: a higher intensity means the model will replace words more readily (wider phoneme matching), while a lower intensity requires a closer phoneme match before a replacement is made.

The `pronunciations` field lets you provide **plain-text alternative spellings that reflect how the word actually sounds** in speech. These are *not* phonetic notation. Just write the word the way someone might naively spell it based on how it sounds. Gladia converts these strings to phonemes internally. For example, if your term is "Nietzsche", you might add `["Niche", "Neechee"]` as pronunciations. This widens the phoneme net without having to raise the intensity (which would increase false positives across the board).

## When to use custom vocabulary vs. custom spelling

Use **[Custom spelling](/chapters/audio-intelligence/custom-spelling)** when the model outputs a recognizable but wrong form. It applies **literal string matching** on variants you list (e.g. **"data-science"** → **"Data Science"**). **List every close variant** the model might output.

Use **[Custom vocabulary](/chapters/audio-intelligence/custom-vocabulary)** when the model outputs garbled or sound-alike text. It applies **phoneme-based matching** on entries you define (e.g. **"le vin"** / **"levine"** → **"Levain"**). **Add pronunciations** for each spelling the model might produce.

|                 | Custom spelling                              | Custom vocabulary                       |
| --------------- | -------------------------------------------- | --------------------------------------- |
| **Matches on**  | Exact text in the transcript                 | How words sound                         |
| **Best for**    | Wrong spelling, punctuation, formatting      | Phonetically similar mis-transcriptions |
| **You provide** | All the words that the model outputs wrongly | `value`, `pronunciations`, `intensity`  |

**Rule of thumb:** start with a transcription run *without* any custom vocabulary. Look at what the output actually says. If the word appears but is just misspelled, custom spelling is the simpler and safer fix. If the word is completely garbled, that's when custom vocabulary is the right tool.

## Example configuration

<CodeGroup>
  ```json Pre-recorded theme={"system"}
  {
    "audio_url": "YOUR_AUDIO_URL",
    "custom_vocabulary": true,
    "custom_vocabulary_config": {
      "vocabulary": [
        "Gladia",
        {"value": "Solaria"},
        {
          "value": "Salesforce",
          "pronunciations": ["sell force", "sale forces"],
          "intensity": 0.5,
          "language": "en"
        },
      ],
      "default_intensity": 0.4
    }
  }
  ```

  ```json Live theme={"system"}
  {
    "realtime_processing": {
      "custom_vocabulary": true,
      "custom_vocabulary_config": {
        "vocabulary": [
          "Gladia",
          {"value": "Solaria"},
          {
            "value": "Salesforce",
            "pronunciations": ["sell force", "sale forces"],
            "intensity": 0.5,
            "language": "en"
          },
        ],
        "default_intensity": 0.4
      }
    }
  }
  ```
</CodeGroup>

## Parameter reference

<ParamField body="vocabulary" type="object | string[]">
  <Expandable title="properties">
    <ParamField body="value" type="string" required>
      The correct word you want to be transcribed.
    </ParamField>

    <ParamField body="pronunciations" type="string[]">
      Words with different spellings the word might be mis-spelled or mis-transcribed.
    </ParamField>

    <ParamField body="intensity" type="number">
      Per-entry intensity, we suggest **0.4–0.6** as value.
      Inherits `default_intensity` when omitted.
    </ParamField>

    <ParamField body="language" type="string">
      Language used for phoneme comparison (defaults to the transcription language). Set this when a term is pronounced in a different language than the rest of the audio.
    </ParamField>
  </Expandable>

  <ParamField body="default_intensity" type="number">
    Global intensity for entries. We suggest **0.4–0.6** raise if terms are missed, lower if unrelated words get replaced.
  </ParamField>
</ParamField>

## Tuning tips

* **Start at `default_intensity` 0.4** and adjust per entry only when needed.
* **Add `pronunciations` before raising `intensity`** — variants narrow what can match without loosening every comparison.
* **Keep lists focused** — every transcribed word is compared against every entry; long lists increase false positives.
* **Move stable misspellings to [custom spelling](/chapters/audio-intelligence/custom-spelling)** when the model already outputs a recognizable (but wrong) form.

## Recommended workflow

1. **Transcribe without custom vocabulary** and note mis-transcribed terms.
2. **Route each term:** garbled or phonetically wrong output → custom vocabulary; recognizable but misspelled → custom spelling.
3. **Add entries** with `pronunciations` and `default_intensity` around **0.4–0.6**.
4. **Transcribe again** — confirm targets appear and scan for false positives.
5. **Refine:** lower `intensity`, tighten `pronunciations`, or move stubborn terms to custom spelling.