Translation

This feature is on Beta state.

We’re looking for feedbacks to improve this feature, share yours here.

The Translation model generates translations of your transcriptions to one or more targeted languages. If subtitles and/or sentences are enabled, the translations will also include translated results for them. You can translate your transcription to multiples languages in a single API call.

The list of the languages covered by the Translation feature are listed in the API Reference (see translation_config).

2 translation models are available:

base : Fast, cover most use cases
enhanced : Slower, but higher quality and with context awareness

Usage

To enable translation simply set the "translation" parameter to true

request data
{
  "audio_url": "<your audio url>"
  "translation": true
}

`translation_config` Options

The translation feature can be further customized using the translation_config object. When translation: true is set, you can also provide a translation_config object to specify more details. Here are the available options:

`target_languages`

Description: An array of strings specifying the language codes for the desired translation outputs.
Example: ["fr", "es"] for French and Spanish.
Details: The list of supported language codes can be found in the list of supported languages.

`model`

Description: Specifies the translation model to be used.
Values:
- "base": Fast and covers most use cases.
- "enhanced": Slower, but offers higher quality and context awareness.
Default: If not specified, the system might use a default model (typically “base”, but refer to API docs for current defaults).

`match_original_utterances` (Default: `true`)

Description: This boolean option controls whether the translated utterances should be aligned with the original utterances from the transcription.
Default: true.
Behavior:
- When true, the system attempts to match the translated segments (utterances, sentences) to the timing and structure of the original detected speech segments.
- When false, the translation might be more fluid or natural-sounding in the target language but could deviate from the original utterance segmentation.
Use Case: Keep as true for most subtitling or dubbing use cases where alignment with original speech is crucial. Set to false if you prioritize a more natural flow in the translated text over strict temporal alignment.

`lipsync` (Default: `true`)

This option controls the behavior of the translation’s alignment with visual cues, specifically lip movements.

How it works: When lipsync is set to true (the default value), the translation process utilizes an advanced lip synchronization matching algorithm. This algorithm is designed to align the translated audio or subtitles with the speaker’s lip movements by leveraging timestamps derived from lip activity.
Advantages: The primary benefit is an improved synchronization between the translated output and the visual of the speaker. This can significantly enhance the viewing experience, especially for dubbed content or when precise visual timing with speech is important.
Potential Trade-off: Due to its focus on matching lip movements, the algorithm might occasionally aggregate two distinct spoken words into a single “word” object within the translated output. This means that while the timing aligns well with the lips, the direct one-to-one correspondence between source words and translated words might sometimes be altered to achieve better visual sync.
When to disable: If a strict, word-for-word translation format is an absolute requirement, and minor deviations for the sake of lip synchronization are not acceptable, you should set lipsync to false. This will instruct the system to prioritize literal word mapping over visual timing synchronization.

Sample code

In the following examples, we’re using base model.

async function makeFetchRequest(url: string, options: any) {
  const response = await fetch(url, options);
  return response.json();
}

async function pollForResult(resultUrl: string, headers: any) {
  while (true) {
    console.log("Polling for results...");
    const pollResponse = await makeFetchRequest(resultUrl, { headers });

    if (pollResponse.status === "done") {
      console.log("- Transcription done: \n");
      const translation = pollResponse.result.translation;
      console.log(translation);
      break;
    } else {
      console.log("Transcription status : ", pollResponse.status);
      await new Promise((resolve) => setTimeout(resolve, 1000));
    }
  }
}

async function startTranscription() {
  const gladiaKey = "YOUR_GLADIA_API_KEY";
  const requestData = {
    audio_url:
      "YOUR_AUDIO_URL",
    translation: true,
    translation_config: {
      target_languages: [ "fr", "es"],
      model: "base" // "enhanced" is slower but of better quality
    }
  };
  const gladiaUrl = "https://api.gladia.io/v2/pre-recorded/";
  const headers = {
    "x-gladia-key": gladiaKey,
    "Content-Type": "application/json",
  };

  console.log("- Sending initial request to Gladia API...");
  const initialResponse = await makeFetchRequest(gladiaUrl, {
    method: "POST",
    headers,
    body: JSON.stringify(requestData),
  });

  console.log("Initial response with Transcription ID :", initialResponse);

  if (initialResponse.result_url) {
    await pollForResult(initialResponse.result_url, headers);
  }
}

startTranscription();

Result

The transcription result will contain a "translation" key with the output of the model:

{
  "transcription":{...},
  "translation": {
    success: true,
    is_empty: false,
    results: [
      {
        words: [
          {
            word: "Diviser",
            start: 0.20043,
            end: 0.7008000000000001,
            confidence: 1
          },
          {
            word: "l'infini",
            start: 0.9009500000000001,
            end: 1.5614400000000002,
            confidence: 1
          },
          ...
        ],
        languages: [Array],
        full_transcript: "Diviser l'infini dans un temps où moins est plus...",
        utterances: [Array], // Also translated
        error: null
      },
      {
        words: [
          {
            word: "Dividir",
            start: 0.20043,
            end: 0.7008000000000001,
            confidence: 1
          },
          {
            word: "la infinidad",
            start: 0.9009500000000001,
            end: 1.5614400000000002,
            confidence: 1
          },
          ...
        ],
        languages: [Array],
        full_transcript: "Dividir la infinidad en un tiempo en que menos es más, donde demasiado nunca es suficiente...",
        utterances: [Array], // Also translated
        error: null
      }
    ],
    exec_time: 0.6475496292114258,
    error: null
  }
}

If you enabled the subtitles generation, those will also benefits from the translation model.

Introduction

Asynchronous Speech-to-Text

Real-time Speech-to-Text

Audio Intelligence

Limits & Specifications

Guides

Integration

Usage

`translation_config` Options

`target_languages`

`model`

`match_original_utterances` (Default: `true`)

`lipsync` (Default: `true`)

Result

Introduction

Asynchronous Speech-to-Text

Real-time Speech-to-Text

Audio Intelligence

Limits & Specifications

Guides

Integration

​Usage

​translation_config Options

​target_languages

​model

​match_original_utterances (Default: true)

​lipsync (Default: true)

​Result

Usage

`translation_config` Options

`target_languages`

`model`

`match_original_utterances` (Default: `true`)

`lipsync` (Default: `true`)

Result