The Translation model generates translations of your transcriptions to one or more targeted languages. If subtitles and/or sentences are enabled, the translations will also include translated results for them. You can translate your transcription to multiple languages in a single API call. The list of the languages covered by the Translation feature are listed in Supported Languages.
2 translation models are available:
  • base : Fast, cover most use cases
  • enhanced : Slower, but higher quality and with context awareness

Quickstart

To enable translation, set translation to true on your request, and add a translation_config object :
{
  "realtime_processing": {
    "translation": true,
    "translation_config": {
      "target_languages": [
        "fr"
      ],
      "model": "base",
      "match_original_utterances": true,
      "lipsync": true,
      "context_adaptation": true,
      "context": "<string>",
      "informal": false
    }
  },
  "messages_config": {
    "receive_realtime_processing_events": true
  }
}

Translation configuration fields

target_languages
string[]
Target language codes for translation output. See the list of supported language codes in >Supported Languages.
model
enum["base", "enhanced"]
default:"base"
Specifies the translation model to be used.
match_original_utterances
boolean
default:true
Keep translated segments aligned with source segmentation. Use true for subtitles/dubbing; set false for a more natural flow in the target language.
  • When true, the system attempts to match the translated segments (utterances, sentences) to the timing and structure of the original detected speech segments.
  • When false, the translation might be more fluid or natural-sounding in the target language but could deviate from the original utterance segmentation.
lipsync
boolean
default:"true"
Controls alignment with visual cues, specifically lip movements. When enabled (default), uses an advanced lip synchronization algorithm that aligns translated output with speaker’s lip movements using timestamps from lip activity. \ This enhances viewing experience for dubbed content but may occasionally merge distinct words into single objects to achieve better visual sync. Set to false if strict word-for-word mapping is required over visual timing synchronization.
context_adaptation
boolean
default:true
Enable context-aware translation. When true, the model leverages extra context and style preferences for better accuracy. Turn off for purely literal translations.
context
string
default:true
Additional context to improve terminology, proper nouns, or disambiguation. Effective with context_adaptation: true.
Example: "Medical consultation between doctor and patient discussing cardiology"
informal
boolean
default:"false"
Prefer informal register when available; useful for chatty UX or youth audiences. Especially relevant for languages with formal/informal distinctions (e.g., French “tu/vous”, German “du/Sie”, Spanish “tú/usted”, Dutch “U/jij”).

Result

The transcription result will contain a "translation" key with the output of the model:
{
  "transcription":{...},
  "translation": {
    success: true,
    is_empty: false,
    results: [
      {
        words: [
          {
            word: "Diviser",
            start: 0.20043,
            end: 0.7008000000000001,
            confidence: 1
          },
          {
            word: "l'infini",
            start: 0.9009500000000001,
            end: 1.5614400000000002,
            confidence: 1
          },
          ...
        ],
        languages: ["fr"],
        full_transcript: "Diviser l'infini dans un temps où moins est plus...",
        utterances: [Array], // Also translated
        error: null
      },
      {
        words: [
          {
            word: "Dividir",
            start: 0.20043,
            end: 0.7008000000000001,
            confidence: 1
          },
          {
            word: "la infinidad",
            start: 0.9009500000000001,
            end: 1.5614400000000002,
            confidence: 1
          },
          ...
        ],
        languages: ["es"],
        full_transcript: "Dividir la infinidad en un tiempo en que menos es más...",
        utterances: [Array], // Also translated
        error: null
      }
    ],
    exec_time: 0.6475496292114258,
    error: null
  }
}
If you enabled the subtitles generation, those will also benefits from the translation model.

Best practices

  • Set target_languages to only the languages you need.
  • Use enhanced with context_adaptation for high-accuracy, domain-heavy content.
  • Provide a meaningful context to improve terminology and named entities.
  • Keep match_original_utterances: true for subtitles; set to false for a more natural flow.
  • Pair with language detection and code switching when source language may vary.