> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gladia.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Recommended Parameters by Use Case

> Best parameter configurations for realtime transcription depending on your application (Voice Agents, Meeting Recorders, Call Centers, Subtitles).

The right parameter configuration can make a significant difference in transcription quality and latency for realtime use cases. This guide covers recommended starting points for common scenarios and highlights pitfalls that frequently trip up new integrations.

<Info>
  These recommendations apply to the **[Realtime
  API](/chapters/live-stt/quickstart)** and are passed during session
  initialization. They are starting points — tune them to match your
  specific needs.
</Info>

***

## Language Configuration

One of the most common configuration mistakes is misunderstanding how `language_config` works. Choosing the right setup avoids unnecessary detection overhead and improves accuracy.

**When to set an explicit language:**

* You **know** the language of the audio ahead of time.
* The audio is **monolingual** (single language throughout).
* You want the **fastest, most accurate** results.

```json theme={"system"}
{
  "language_config": {
    "languages": ["en"],
    "code_switching": false
  }
}
```

**When to use auto-detection:**

* You process audio in **many different languages** and don't know which one beforehand.
* You want Gladia to pick the language automatically.

```json theme={"system"}
{
  "language_config": {
    "languages": [],
    "code_switching": false
  }
}
```

<Warning>
  When `code_switching` is `false` and no language is set, the language is
  detected on the **first utterance** and reused for the rest of the session or
  file. If the beginning of your audio contains silence, music, or a different
  language than the main content, this can lead to incorrect detection for the
  whole transcription.
</Warning>

<Tip>
  Even when using auto-detection, pass a **small list of likely languages** in
  `languages` to constrain the search. This improves both accuracy and
  processing time.
</Tip>

***

## Code Switching

Code switching (`language_config.code_switching: true`) lets Gladia detect and transcribe **multiple languages** within the same audio, re-evaluating the language on each utterance.

**When to enable it:**

* Speakers **switch languages** mid-conversation (e.g. bilingual meetings, multilingual customer support).
* You need the detected `language` returned **per utterance**.

**When NOT to enable it:**

* The audio is in a **single language** — code switching adds unnecessary processing and can introduce misdetections.
* You've set **exactly one language** in `languages` — in that case `code_switching` is ignored anyway.

```json theme={"system"}
{
  "language_config": {
    "languages": ["en", "fr", "es"],
    "code_switching": true
  }
}
```

<Warning>
  **Do not enable `code_switching` with an empty `languages` list.** When no
  languages are specified, the language detector evaluates every utterance
  against 100+ supported languages, which leads to frequent misdetections —
  especially between similar-sounding languages. Always provide a short list of
  languages you **actually expect** in the audio.
</Warning>

***

## Custom Vocabulary

[Custom vocabulary](/chapters/audio-intelligence/custom-vocabulary) is a post-transcription replacement based on **phoneme similarity**. It's essential for domain-specific terms that speech models frequently mis-transcribe.

**Best practices:**

* **Always provide both** the `custom_vocabulary` flag and a `custom_vocabulary_config`.
* **Add pronunciations** to provide all the close spelling variants. You can use Automatic Phonemic Transcriber (IPA) in order to check if all the different spellings are covered.
* **Keep `intensity` moderate** (0.4-0.6). High values increase false positives where unrelated words get replaced.
* **Set `language`** on individual vocabulary entries when your audio is multilingual and a term is pronounced differently depending on the language.

<CodeGroup>
  ```json Pre-recorded theme={"system"}
  {
    "audio_url": "YOUR_AUDIO_URL",
    "custom_vocabulary": true,
    "custom_vocabulary_config": {
      "vocabulary": [
        "Kubernetes",
        {
          "value": "Gladia",
          "pronunciations": ["Glad", "Gladio"],
          "intensity": 0.5
        },
        {
          "value": "PostgreSQL",
          "pronunciations": ["Postgres Q L", "Post gress"],
          "intensity": 0.4
        }
      ],
      "default_intensity": 0.5
    }
  }
  ```

  ```json Live theme={"system"}
  {
    "realtime_processing": {
      "custom_vocabulary": true,
      "custom_vocabulary_config": {
        "vocabulary": [
          "Kubernetes",
          {
          "value": "Gladia",
          "pronunciations": ["Glad", "Gladio"],          
          "intensity": 0.5
          },
          {
            "value": "PostgreSQL",
            "pronunciations": ["Postgres Q L", "Post gress"],
            "intensity": 0.4
          }
        ],
        "default_intensity": 0.5
      }
    }
  }
  ```
</CodeGroup>

***

## Voice Agents

For callbots, customer-service assistants, or voice-driven chatbots the top priority is **low latency**. The agent must react quickly to user speech, even if sentence boundaries are not perfectly formed.

| Parameter                                     | Recommended value | Why                                                                                                                                                                                                                   |
| --------------------------------------------- | ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `endpointing`                                 | `0.05` - `0.1`    | Closes utterances fast, keeping turn-taking snappy. See [Endpointing](/chapters/live-stt/features/endpointing).                                                                                                       |
| `maximum_duration_without_endpointing`        | `15`              | Prevents very long utterances from staying open without cutting off the conversation.                                                                                                                                 |
| `messages_config.receive_partial_transcripts` | `true`            | Enables interim results so the agent can start processing early. Use the `speech_stop` event to know when the user has finished speaking. See [Partial transcripts](/chapters/live-stt/features/partial-transcripts). |
| `realtime_processing.custom_vocabulary`       | `true`            | Add product names and action keywords so the agent can react accurately.                                                                                                                                              |

<Tip>
  This setup is optimized for **fast turn-taking**. If utterances get cut off
  mid-sentence, raise `endpointing` slightly.
</Tip>

***

## Meeting Recorders

For apps that record and transcribe meetings in real time — team stand-ups, board sessions, 1-on-1s — the goal is to produce a **structured, speaker-attributed live transcript** that can feed downstream features like summarization or live note-taking.

| Parameter                                     | Recommended value | Why                                                                                                                                        |
| --------------------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| `endpointing`                                 | `0.3` - `0.5`     | Lets speakers finish their sentences before closing an utterance. See [Endpointing](/chapters/live-stt/features/endpointing).              |
| `maximum_duration_without_endpointing`        | `15`              | Prevents very long utterances in case a speaker doesn't pause.                                                                             |
| `messages_config.receive_partial_transcripts` | `true`            | Feeds live captions to the UI while waiting for final results. See [Partial transcripts](/chapters/live-stt/features/partial-transcripts). |
| `language_config.languages`                   | Set explicitly    | Meeting language is almost always known in advance — setting it avoids detection overhead.                                                 |
| `realtime_processing.custom_vocabulary`       | `true`            | Add company-specific terms, project names, and participant names for better accuracy.                                                      |

<Info>
  **Diarization vs. multi-channel:** if each speaker is on a **separate audio channel** (e.g. a, use the `channel` field on each utterance to identify who is speaking — diarization is not needed. See [Multiple channels](/chapters/limits-and-specifications/multiple-channels)

  If all speakers share a **single audio channel**, enable `diarization` to separate the speakers. See [Speaker diarization](/chapters/audio-intelligence/speaker-diarization).
</Info>

***

## Call Centers

For live phone calls the priorities are **speaker identification** and **fast, accurate transcription** despite variable audio quality (telephony codecs, background noise, cross-talk).

| Parameter                               | Recommended value              | Why                                                                                                                        |
| --------------------------------------- | ------------------------------ | -------------------------------------------------------------------------------------------------------------------------- |
| `endpointing`                           | `0.2` - `0.4`                  | Keeps turn-taking responsive without cutting off mid-sentence. See [Endpointing](/chapters/live-stt/features/endpointing). |
| `maximum_duration_without_endpointing`  | `15`                           | Prevents very long utterances in monologue-style segments.                                                                 |
| `language_config.languages`             | Set explicitly (e.g. `["en"]`) | Call center audio typically has a known language. Setting it avoids detection errors on noisy recordings.                  |
| `realtime_processing.custom_vocabulary` | `true`                         | Add product names, plan names, and internal terminology.                                                                   |

<Info>
  **Diarization vs. multi-channel:** if each speaker is on a **separate audio channel** (e.g. a, use the `channel` field on each utterance to identify who is speaking — diarization is not needed. See [Multiple channels](/chapters/limits-and-specifications/multiple-channels)

  If all speakers share a **single audio channel**, enable `diarization` to separate the speakers. See [Speaker diarization](/chapters/audio-intelligence/speaker-diarization).
</Info>

<Tip>
  For calls with more than two participants (e.g. conference bridges), use
  `diarization_config.min_speakers` / `max_speakers` instead of `number_of_speakers` to give the
  model a flexible range.
</Tip>

***

## Subtitles & Captioning

When providing live subtitles, the goal is to **sync text with the speaker in real time**. The right balance between speed and segment quality depends on whether captions are displayed live or post-produced.

| Parameter                                     | Recommended value                      | Why                                                                                             |
| --------------------------------------------- | -------------------------------------- | ----------------------------------------------------------------------------------------------- |
| `endpointing`                                 | `0.3` (live) / `0.8` (post-production) | Lower values keep captions close to the speaker; higher values produce cleaner subtitle blocks. |
| `maximum_duration_without_endpointing`        | `5`                                    | Prevents excessively long subtitle segments that are hard to read on screen.                    |
| `messages_config.receive_partial_transcripts` | `true`                                 | Shows words as they are spoken, then refines them when the final result arrives.                |
| `language_config.languages`                   | Set explicitly                         | Avoids detection lag when the broadcast language is known.                                      |

<Tip>
  For post-production subtitles generated from a recording, consider using the
  [Pre-recorded API](/chapters/pre-recorded-stt/quickstart) with the dedicated
  [subtitles feature](/chapters/audio-intelligence/subtitles) instead — it
  produces SRT/VTT files with fine-grained timing controls.
</Tip>