Translation

The Translation model generates translations of your transcriptions to one or more targeted languages. If subtitles and/or sentences are enabled, the translations will also include translated results for them. You can translate your transcription to multiple languages in a single API call. The list of the languages covered by the Translation feature are listed in Supported Languages.

2 translation models are available:

base : Fast, cover most use cases
enhanced : Slower, but higher quality and with context awareness

Quickstart

To enable translation, set translation to true on your request, and add a translation_config object :

{
  "realtime_processing": {
    "translation": true,
    "translation_config": {
      "target_languages": [
        "fr"
      ],
      "model": "base",
      "match_original_utterances": true,
      "lipsync": true,
      "context_adaptation": true,
      "context": "<string>",
      "informal": false
    }
  },
  "messages_config": {
    "receive_realtime_processing_events": true
  }
}

Translation configuration fields

target_languages

string[]

Target language codes for translation output. See the list of supported language codes in >Supported Languages.

model

enum["base", "enhanced"]

default:"base"

Specifies the translation model to be used.

match_original_utterances

boolean

default:true

Keep translated segments aligned with source segmentation. Use true for subtitles/dubbing; set false for a more natural flow in the target language.

When true, the system attempts to match the translated segments (utterances, sentences) to the timing and structure of the original detected speech segments.
When false, the translation might be more fluid or natural-sounding in the target language but could deviate from the original utterance segmentation.

lipsync

boolean

default:true

Controls alignment with visual cues, specifically lip movements. When enabled (default), uses an advanced lip synchronization algorithm that aligns translated output with speaker’s lip movements using timestamps from lip activity. \ This enhances viewing experience for dubbed content but may occasionally merge distinct words into single objects to achieve better visual sync. Set to false if strict word-for-word mapping is required over visual timing synchronization.

context_adaptation

boolean

default:true

Enable context-aware translation. When true, the model leverages extra context and style preferences for better accuracy. Turn off for purely literal translations.

context

string

Additional context to improve terminology, proper nouns, or disambiguation. Effective with context_adaptation: true.
Example: "Medical consultation between doctor and patient discussing cardiology"

informal

boolean

default:false

Prefer informal register when available; useful for chatty UX or youth audiences. Especially relevant for languages with formal/informal distinctions (e.g., French “tu/vous”, German “du/Sie”, Spanish “tú/usted”, Dutch “U/jij”).

Result

The transcription result will contain a "translation" key with the output of the model:

{
  "transcription":{...},
  "translation": {
    success: true,
    is_empty: false,
    results: [
      {
        words: [
          {
            word: "Diviser",
            start: 0.20043,
            end: 0.7008000000000001,
            confidence: 1
          },
          {
            word: "l'infini",
            start: 0.9009500000000001,
            end: 1.5614400000000002,
            confidence: 1
          },
          ...
        ],
        languages: ["fr"],
        full_transcript: "Diviser l'infini dans un temps où moins est plus...",
        utterances: [Array], // Also translated
        error: null
      },
      {
        words: [
          {
            word: "Dividir",
            start: 0.20043,
            end: 0.7008000000000001,
            confidence: 1
          },
          {
            word: "la infinidad",
            start: 0.9009500000000001,
            end: 1.5614400000000002,
            confidence: 1
          },
          ...
        ],
        languages: ["es"],
        full_transcript: "Dividir la infinidad en un tiempo en que menos es más...",
        utterances: [Array], // Also translated
        error: null
      }
    ],
    exec_time: 0.6475496292114258,
    error: null
  }
}

If you enabled the subtitles generation, those will also benefits from the translation model.

Best practices

Set target_languages to only the languages you need.
Use enhanced with context_adaptation for high-accuracy, domain-heavy content.
Provide a meaningful context to improve terminology and named entities.
Keep match_original_utterances: true for subtitles; set to false for a more natural flow.
Pair with language detection and code switching when source language may vary.

Introduction

Speech-to-Text

Integrations

Language

Audio Intelligence

Limits & Specifications

Migrations

Quickstart

Translation configuration fields

Result

Best practices

Introduction

Speech-to-Text

Integrations

Language

Audio Intelligence

Limits & Specifications

Migrations

​Quickstart

​Translation configuration fields

​Result

​Best practices

Quickstart

Translation configuration fields

Result

Best practices