> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gladia.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Custom vocabulary

> Improve recognition of domain-specific words and phrases

<Badge color="blue" size="lg" icon="file-audio">
  Pre-recorded
</Badge>

<Badge color="green" icon="tower-broadcast" size="lg">
  Live
</Badge>

Custom vocabulary helps the transcription engine **recognize words it would otherwise get wrong**: unusual brand names, internal project codenames, medical or legal jargon, or acronyms that sound like common words.

It works by comparing the *sounds* (phonemes) of what was actually spoken against the sounds of the words you provide. When there's a close enough match, the transcribed word gets swapped out for your term. This is **probabilistic**: it increases the odds of a correct transcription, but it does not guarantee it.

<Note>
  If you already know exactly which *text* variants the model produces and you
  just want to normalize the spelling, use **[Custom
  spelling](/chapters/audio-intelligence/custom-spelling)** instead. Custom
  spelling is a deterministic find-and-replace on the transcript text, with no
  phoneme matching and no false positives.
</Note>

## How it works

Custom vocabulary operates at a **text level** and is based on **phoneme similarity**.

Once the transcription is generated, Gladia converts both the transcribed words and your vocabulary entries into phonemes, then compares them. The `intensity` controls how aggressively the model applies replacements: a higher intensity means the model will replace words more readily (wider phoneme matching), while a lower intensity requires a closer phoneme match before a replacement is made.

The `pronunciations` field lets you provide **plain-text alternative spellings that reflect how the word actually sounds** in speech. These are *not* phonetic notation. Just write the word the way someone might naively spell it based on how it sounds. Gladia converts these strings to phonemes internally. For example, if your term is "Nietzsche", you might add `["Niche", "Neechee"]` as pronunciations. This widens the phoneme net without having to raise the intensity (which would increase false positives across the board).

## When to use custom vocabulary vs. custom spelling

Use **[custom spelling](/chapters/audio-intelligence/custom-spelling)** when the transcription already *recognizes* the word but writes it differently than you want. Common cases:

* A person's name comes through as "Gaurish" or "Gaureish" but you need "Gorish".
* The model writes "data-science" and you want "Data Science".
* You want to replace filler words or normalize punctuation (e.g. "period" → ".").

Use **[custom vocabulary](/chapters/audio-intelligence/custom-vocabulary)** when the word comes out completely garbled or replaced by something phonetically similar. The transcription engine has never seen it and can't get close on its own. Custom vocabulary uses phoneme matching to catch these cases, but it's probabilistic and can produce false positives.

|                  | Custom vocabulary                                                                                  | Custom spelling                                                                                        |
| ---------------- | -------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
| **What it does** | Listens to how a word *sounds* and replaces phonetically similar words in the transcript           | Finds exact text strings in the transcript and replaces them with your preferred spelling              |
| **Mechanism**    | Phoneme-based similarity matching (probabilistic)                                                  | Text-based find-and-replace (deterministic)                                                            |
| **Best for**     | Words that are consistently mis-transcribed: unusual proper nouns, new product names, niche jargon | Words that are recognizable but misspelled, e.g. "Gaurish" → "Gorish", "data-science" → "Data Science" |
| **Risk**         | Can produce false positives. Unrelated words that happen to sound similar may get replaced         | No false positives, but it won't help if the word isn't recognized at all                              |
| **Tuning**       | Adjust `intensity` and `default_intensity` to control aggressiveness                               | None needed. It either matches the text or it doesn't                                                  |

**Rule of thumb:** start with a transcription run *without* any custom vocabulary. Look at what the output actually says. If the word appears but is just misspelled, custom spelling is the simpler and safer fix. If the word is completely garbled, that's when custom vocabulary is the right tool.

<Tip>
  If you've been using [custom
  vocabulary](/chapters/audio-intelligence/custom-vocabulary) and keep running
  into false positives for certain terms, try moving those terms to [custom
  spelling](/chapters/audio-intelligence/custom-spelling) instead. As long as
  the transcription produces something close enough for you to list as a
  variant, custom spelling will handle the rest, deterministically and without
  side effects. This is a common and recommended migration path.
</Tip>

## Example configuration

<CodeGroup>
  ```json Pre-recorded theme={"system"}
  {
    "audio_url": "YOUR_AUDIO_URL",
    "custom_vocabulary": true,
    "custom_vocabulary_config": {
      "vocabulary": [
        "Gladia",
        {"value": "Solaria"},
        {
          "value": "Salesforce",
          "pronunciations": ["sell force", "sale forces"],
          "intensity": 0.5,
          "language": "en"
        },
      ],
      "default_intensity": 0.4
    }
  }
  ```

  ```json Live theme={"system"}
  {
    "realtime_processing": {
      "custom_vocabulary": true,
      "custom_vocabulary_config": {
        "vocabulary": [
          "Gladia",
          {"value": "Solaria"},
          {
            "value": "Salesforce",
            "pronunciations": ["sell force", "sale forces"],
            "intensity": 0.5,
            "language": "en"
          },
        ],
        "default_intensity": 0.4
      }
    }
  }
  ```
</CodeGroup>

## Parameter reference

<ParamField body="default_intensity" type="number">
  The global intensity applied to every vocabulary entry that doesn't have its own `intensity` override (minimum 0, maximum 1, default 0.5).

  A higher value means the model will apply replacements more aggressively: more replacements, but more risk of unwanted swaps. A lower value requires a closer phoneme match before replacing: fewer replacements, fewer false positives.
</ParamField>

<ParamField body="vocabulary" type="object | string[]">
  <Expandable title="properties">
    <ParamField body="value" type="string" required>
      The text that will be inserted into the transcription when a phoneme match is found.
    </ParamField>

    <ParamField body="pronunciations" type="string[]">
      Plain-text alternative spellings that reflect how the word sounds in speech. Write them the way someone would naively spell the word based on pronunciation. Gladia converts these to phonemes internally. This is *not* phonetic notation.
    </ParamField>

    <ParamField body="intensity" type="number">
      The intensity for this specific entry (minimum 0, maximum 1, default: inherits from `default_intensity`). Use this to make individual entries more or less aggressive than the global default. For example, set a lower intensity on a short word that keeps producing false positives.
    </ParamField>

    <ParamField body="language" type="string">
      The language in which this word will be pronounced during phoneme comparison. Defaults to the transcription language. This matters when a word from one language appears in a conversation in another language. For example, an English brand name like "Salesforce" spoken in a French meeting. Setting `language: "en"` ensures the phoneme comparison uses English pronunciation rules, not French ones.
    </ParamField>
  </Expandable>
</ParamField>

## Tuning intensity

The default intensity is **0.5**, which works well for short lists of very distinctive words. But in practice, especially with longer lists or shorter words, 0.5 is often too aggressive and produces false positives.

**We recommend starting at 0.4** and raising only if you notice that your terms are still not being picked up, or lowering if you see too many false positives.

### `default_intensity` vs. per-entry `intensity`

* `default_intensity` sets the baseline for every entry in your vocabulary list.
* The per-entry `intensity` field overrides the global default for that specific word.

You can mix both. A common pattern: set `default_intensity` to 0.4, then lower individual short or common-sounding words (like brand names "Target" or "Zoom") down to 0.2-0.3 to avoid them matching too many unrelated words.

### Watch your list size

As your vocabulary list grows, the chance of false positives increases. Every transcribed word is compared against every entry, so a list of 50+ terms will naturally produce more unintended replacements than a list of 5.

If you find yourself fighting false positives on a large list, consider:

1. Lowering the intensity for the entries that cause problems.
2. Adding specific `pronunciations` to narrow the phoneme matching instead of lowering intensity.
3. Moving entries that the model already recognizes (just with wrong spelling) to **[custom spelling](/chapters/audio-intelligence/custom-spelling)** instead. This eliminates false positives entirely for those terms.

## Recommended workflow

If you're setting up custom vocabulary for the first time, here's a step-by-step approach that will save you time:

1. **Run a transcription without any custom vocabulary.** Look at the raw output and identify which words are being mis-transcribed.
2. **Separate the problems into two buckets:**
   * Words that are completely wrong or garbled → these are candidates for **custom vocabulary**.
   * Words that are recognizable but misspelled (e.g. "Gaurish" instead of "Gorish") → use **[custom spelling](/chapters/audio-intelligence/custom-spelling)** for these.
3. **Add your custom vocabulary entries** with `default_intensity` set to **0.4**.
4. **Run the transcription again** and compare. Check that your terms are now appearing correctly.
5. **Look for false positives**, words that were correct before but are now being wrongly replaced. If you spot any:
   * Lower the `intensity` on the entry causing the problem.
   * Add `pronunciations` to make the match more precise.
   * If false positives persist, move that entry to custom spelling instead.
6. **Iterate.** Tuning is normal. Don't expect to get it perfect on the first pass.
