Skip to main content
The Translation model generates translations of your transcriptions to one or more targeted languages. If subtitles and/or sentences are enabled, the translations will also include translated results for them. You can translate your transcription to multiple languages in a single API call. The list of the languages covered by the Translation feature are listed in Supported Languages.
2 translation models are available:
  • base : Fast, cover most use cases
  • enhanced : Slower, but higher quality and with context awareness

Quickstart

To enable translation, set translation to true on your request, and add a translation_config object :
{
  "realtime_processing": {
    "translation": true,
    "translation_config": {
      "target_languages": [
        "fr"
      ],
      "model": "base",
      "match_original_utterances": true,
      "lipsync": true,
      "context_adaptation": true,
      "context": "<string>",
      "informal": false
    }
  },
  "messages_config": {
    "receive_realtime_processing_events": true
  }
}

Translation configuration fields

target_languages
string[]
Target language codes for translation output. See the list of supported language codes in >Supported Languages.
model
enum["base", "enhanced"]
default:"base"
Specifies the translation model to be used.
match_original_utterances
boolean
default:true
Keep translated segments aligned with source segmentation. Use true for subtitles/dubbing; set false for a more natural flow in the target language.
  • When true, the system attempts to match the translated segments (utterances, sentences) to the timing and structure of the original detected speech segments.
  • When false, the translation might be more fluid or natural-sounding in the target language but could deviate from the original utterance segmentation.
lipsync
boolean
default:true
Controls alignment with visual cues, specifically lip movements. When enabled (default), uses an advanced lip synchronization algorithm that aligns translated output with speaker’s lip movements using timestamps from lip activity. \ This enhances viewing experience for dubbed content but may occasionally merge distinct words into single objects to achieve better visual sync. Set to false if strict word-for-word mapping is required over visual timing synchronization.
context_adaptation
boolean
default:true
Enable context-aware translation. When true, the model leverages extra context and style preferences for better accuracy. Turn off for purely literal translations.
context
string
Additional context to improve terminology, proper nouns, or disambiguation. Effective with context_adaptation: true.
Example: "Medical consultation between doctor and patient discussing cardiology"
informal
boolean
default:false
Prefer informal register when available; useful for chatty UX or youth audiences. Especially relevant for languages with formal/informal distinctions (e.g., French “tu/vous”, German “du/Sie”, Spanish “tú/usted”, Dutch “U/jij”).

Result

The transcription result will contain a "translation" key with the output of the model:
{
  "transcription":{...},
  "translation": {
    success: true,
    is_empty: false,
    results: [
      {
        words: [
          {
            word: "Diviser",
            start: 0.20043,
            end: 0.7008000000000001,
            confidence: 1
          },
          {
            word: "l'infini",
            start: 0.9009500000000001,
            end: 1.5614400000000002,
            confidence: 1
          },
          ...
        ],
        languages: ["fr"],
        full_transcript: "Diviser l'infini dans un temps où moins est plus...",
        utterances: [Array], // Also translated
        error: null
      },
      {
        words: [
          {
            word: "Dividir",
            start: 0.20043,
            end: 0.7008000000000001,
            confidence: 1
          },
          {
            word: "la infinidad",
            start: 0.9009500000000001,
            end: 1.5614400000000002,
            confidence: 1
          },
          ...
        ],
        languages: ["es"],
        full_transcript: "Dividir la infinidad en un tiempo en que menos es más...",
        utterances: [Array], // Also translated
        error: null
      }
    ],
    exec_time: 0.6475496292114258,
    error: null
  }
}
If you enabled the subtitles generation, those will also benefits from the translation model.

Best practices

  • Set target_languages to only the languages you need.
  • Use enhanced with context_adaptation for high-accuracy, domain-heavy content.
  • Provide a meaningful context to improve terminology and named entities.
  • Keep match_original_utterances: true for subtitles; set to false for a more natural flow.
  • Pair with language detection and code switching when source language may vary.
I