Recommended Parameters by Use Case

The right parameter configuration can significantly impact transcription quality for pre-recorded audio. This guide covers recommended starting points for common scenarios and highlights pitfalls that frequently trip up new integrations.

These recommendations apply to the Pre-recorded API and are passed in the POST /v2/pre-recorded request body. They are starting points — tune them to match your specific needs.

Language Configuration

One of the most common configuration mistakes is misunderstanding how language_config works. Choosing the right setup avoids unnecessary detection overhead and improves accuracy. When to set an explicit language:

You know the language of the audio ahead of time.
The audio is monolingual (single language throughout).
You want the fastest, most accurate results.

{
  "language_config": {
    "languages": ["en"],
    "code_switching": false
  }
}

When to use auto-detection:

You process audio in many different languages and don’t know which one beforehand.
You want Gladia to pick the language automatically.

{
  "language_config": {
    "languages": [],
    "code_switching": false
  }
}

When code_switching is false and no language is set, the language is detected on the first utterance and reused for the rest of the session or file. If the beginning of your audio contains silence, music, or a different language than the main content, this can lead to incorrect detection for the whole transcription.

Even when using auto-detection, pass a small list of likely languages in languages to constrain the search. This improves both accuracy and processing time.

Code Switching

Code switching (language_config.code_switching: true) lets Gladia detect and transcribe multiple languages within the same audio, re-evaluating the language on each utterance. When to enable it:

Speakers switch languages mid-conversation (e.g. bilingual meetings, multilingual customer support).
You need the detected language returned per utterance.

When NOT to enable it:

The audio is in a single language — code switching adds unnecessary processing and can introduce misdetections.
You’ve set exactly one language in languages — in that case code_switching is ignored anyway.

{
  "language_config": {
    "languages": ["en", "fr", "es"],
    "code_switching": true
  }
}

Do not enable code_switching with an empty languages list. When no languages are specified, the language detector evaluates every utterance against 100+ supported languages, which leads to frequent misdetections — especially between similar-sounding languages. Always provide a short list of languages you actually expect in the audio.

Custom Vocabulary

Custom vocabulary is a post-transcription replacement based on phoneme similarity. It’s essential for domain-specific terms that speech models frequently mis-transcribe. Best practices:

Always provide both the custom_vocabulary flag and a custom_vocabulary_config.
Add pronunciations to provide all the close spelling variants. You can use Automatic Phonemic Transcriber (IPA) in order to check if all the different spellings are covered.
Keep intensity moderate (0.4-0.6). High values increase false positives where unrelated words get replaced.
Set language on individual vocabulary entries when your audio is multilingual and a term is pronounced differently depending on the language.

{
  "audio_url": "YOUR_AUDIO_URL",
  "custom_vocabulary": true,
  "custom_vocabulary_config": {
    "vocabulary": [
      "Kubernetes",
      {
        "value": "Gladia",
        "pronunciations": ["Glad", "Gladio"],
        "intensity": 0.5
      },
      {
        "value": "PostgreSQL",
        "pronunciations": ["Postgres Q L", "Post gress"],
        "intensity": 0.4
      }
    ],
    "default_intensity": 0.5
  }
}

{
  "realtime_processing": {
    "custom_vocabulary": true,
    "custom_vocabulary_config": {
      "vocabulary": [
        "Kubernetes",
        {
        "value": "Gladia",
        "pronunciations": ["Glad", "Gladio"],          
        "intensity": 0.5
        },
        {
          "value": "PostgreSQL",
          "pronunciations": ["Postgres Q L", "Post gress"],
          "intensity": 0.4
        }
      ],
      "default_intensity": 0.5
    }
  }
}

Meeting Recorders

For apps that record and process meetings — team stand-ups, board sessions, 1-on-1s — the goal is to produce structured, actionable meeting notes with clear speaker attribution. Meetings typically have a known set of participants and benefit heavily from post-processing features like summarization.

Parameter	Recommended value	Why
`diarization`	`true`	Attributes speech to each participant. See Speaker diarization.
`diarization_config.min_speakers` / `max_speakers`	Set a range (e.g. `2`-`10`)	Meeting size varies — a range lets the model adapt without over- or under-splitting speakers.
`summarization`	`true`	Generates a summary for quick review. Use `bullet_points` type for action-item style output. See Summarization.
`named_entity_recognition`	`true`	Surfaces people, organizations, dates, and other key entities mentioned during the meeting. See NER.
`sentences`	`true`	Produces well-segmented, readable output suitable for meeting minutes. See Sentences.
`language_config.languages`	Set explicitly	Meeting language is almost always known in advance — setting it avoids detection overhead.
`custom_vocabulary`	`true`	Add company-specific terms, project names, and participant names for better accuracy.

Diarization vs. multi-channel: if each speaker is on a separate audio channel (e.g. a, use the channel field on each utterance to identify who is speaking — diarization is not needed. See Multiple channelsIf all speakers share a single audio channel, enable diarization to separate the speakers. See Speaker diarization.

Call Centers

For recorded phone calls the priorities are speaker identification and accurate transcription despite variable audio quality (telephony codecs, background noise, cross-talk).

Parameter	Recommended value	Why
`language_config.languages`	Set explicitly (e.g. `["en"]`)	Call center audio typically has a known language. Setting it avoids detection errors on noisy recordings.
`diarization`	`true`	Separates agent and customer speech. See Speaker diarization.
`diarization_config.number_of_speakers`	`2`	Most calls have exactly two participants — giving this hint improves speaker assignment accuracy.
`custom_vocabulary`	`true`	Add product names, plan names, and internal terminology.
`summarization`	`true`	Automatically generates a summary for agent wrap-up notes. See Summarization.

Podcasts & Interviews

For long-form audio with multiple speakers the focus is on readability and correct speaker attribution. Transcripts are often repurposed as articles or show notes, so segment quality matters.

Parameter	Recommended value	Why
`diarization`	`true`	Essential for multi-speaker content.
`diarization_config.min_speakers` / `max_speakers`	Set a range (e.g. `2`-`4`)	Provides a flexible hint when the exact count varies across episodes.
`sentences`	`true`	Produces well-segmented, readable output suitable for publishing. See Sentences.
`custom_vocabulary`	`true`	Add recurring guest names, show-specific terms, and brand names.
`language_config.languages`	Set explicitly	Podcast language is almost always known in advance.

Subtitles & Captioning

When generating subtitle files from pre-recorded content, tune the formatting parameters for the best viewing experience. Gladia produces SRT and VTT files directly — no post-processing needed. See Subtitles for the full parameter reference.

Parameter	Recommended value	Why
`subtitles`	`true`	Enables subtitle generation.
`subtitles_config.formats`	`["srt", "vtt"]`	Generate both formats to cover different players and platforms.
`subtitles_config.maximum_characters_per_row`	`42`	Standard broadcast limit for readability.
`subtitles_config.maximum_rows_per_caption`	`2`	Keeps captions compact on screen.
`subtitles_config.style`	`"compliance"`	Uses stricter formatting rules suited for broadcast or accessibility requirements.
`translation`	`true` (if needed)	When enabled, subtitles are automatically generated for each target language. See Translation.

For live captions streamed in real time, use the Realtime API with partial transcripts instead — see the Live recommended parameters guide.

Multilingual Content

For content with mixed languages — conferences, multilingual media, interviews with speakers from different countries — combine language detection with code switching.

Parameter	Recommended value	Why
`language_config.languages`	List of expected languages (e.g. `["en", "fr", "de"]`)	Constrain to 3-5 expected languages for best accuracy.
`language_config.code_switching`	`true`	Detects language shifts across utterances. See Code switching.
`custom_vocabulary`	`true`	Add terms for each language with appropriate `language` tags on each entry.

Do not enable code_switching with an empty languages list. The detector would evaluate every utterance against 100+ languages, leading to frequent misdetections — especially between similar-sounding languages.

Introduction

Speech-to-Text

Language

Audio Intelligence

Integrations

Limits & Specifications

Migrations

Recommended Parameters by Use Case

Language Configuration

Code Switching

Custom Vocabulary

Meeting Recorders

Call Centers

Podcasts & Interviews

Subtitles & Captioning

Multilingual Content

​Language Configuration

​Code Switching

​Custom Vocabulary

​Meeting Recorders

​Call Centers

​Podcasts & Interviews

​Subtitles & Captioning

​Multilingual Content

Language Configuration

Code Switching

Custom Vocabulary

Meeting Recorders

Call Centers

Podcasts & Interviews

Subtitles & Captioning

Multilingual Content