Translation
Translate your transcriptions & subtitles
This feature is on Beta state.
We’re looking for feedbacks to improve this feature, share yours here.
The Translation model generates translations of your transcriptions to one or more targeted languages. If subtitles and/or sentences are enabled, the translations will also include translated results for them. You can translate your transcription to multiples languages in a single API call.
The list of the languages covered by the Translation feature are listed in the API Reference (see translation_config
).
2 translation models are available:
base
: Fast, cover most use casesenhanced
: Slower, but higher quality and with context awareness
Usage
To enable translation simply set the "translation"
parameter to true
translation_config
Options
The translation
feature can be further customized using the translation_config
object. When translation: true
is set, you can also provide a translation_config
object to specify more details. Here are the available options:
target_languages
- Description: An array of strings specifying the language codes for the desired translation outputs.
- Example:
["fr", "es"]
for French and Spanish. - Details: The list of supported language codes can be found in the list of supported languages.
model
- Description: Specifies the translation model to be used.
- Values:
"base"
: Fast and covers most use cases."enhanced"
: Slower, but offers higher quality and context awareness.
- Default: If not specified, the system might use a default model (typically “base”, but refer to API docs for current defaults).
match_original_utterances
(Default: true
)
- Description: This boolean option controls whether the translated utterances should be aligned with the original utterances from the transcription.
- Default:
true
. - Behavior:
- When
true
, the system attempts to match the translated segments (utterances, sentences) to the timing and structure of the original detected speech segments. - When
false
, the translation might be more fluid or natural-sounding in the target language but could deviate from the original utterance segmentation.
- When
- Use Case: Keep as
true
for most subtitling or dubbing use cases where alignment with original speech is crucial. Set tofalse
if you prioritize a more natural flow in the translated text over strict temporal alignment.
lipsync
(Default: true
)
This option controls the behavior of the translation’s alignment with visual cues, specifically lip movements.
-
How it works: When
lipsync
is set totrue
(the default value), the translation process utilizes an advanced lip synchronization matching algorithm. This algorithm is designed to align the translated audio or subtitles with the speaker’s lip movements by leveraging timestamps derived from lip activity. -
Advantages: The primary benefit is an improved synchronization between the translated output and the visual of the speaker. This can significantly enhance the viewing experience, especially for dubbed content or when precise visual timing with speech is important.
-
Potential Trade-off: Due to its focus on matching lip movements, the algorithm might occasionally aggregate two distinct spoken words into a single “word” object within the translated output. This means that while the timing aligns well with the lips, the direct one-to-one correspondence between source words and translated words might sometimes be altered to achieve better visual sync.
-
When to disable: If a strict, word-for-word translation format is an absolute requirement, and minor deviations for the sake of lip synchronization are not acceptable, you should set
lipsync
tofalse
. This will instruct the system to prioritize literal word mapping over visual timing synchronization.
Result
The transcription result will contain a "translation"
key with the output of the model:
If you enabled the subtitles
generation, those will also benefits from the translation model.