Core features of the Gladia Pre-recorded STT API
language_config.languages
parameter to ensure the best transcription results.
language_config.languages
parameter; the model will automatically detect the language from the audio across all supported languages.language_config.languages
parameter; the model will detect the language from the audio within the provided options.language_config.code_switching
parameter. This will allow the model to switch languages dynamically and reflect it in the transcription results.
As with single-language configuration, you can either let the model detect the language from all supported languages or specify a set of options to narrow down the selection.
punctuation_enhanced
parameter in the transcription request:
words
property like this:
sentences
and translation
! You’ll receive sentences output for the the original transcript, and also each translation
result will contain the sentences output in the translated language!sentences
key (in addition to utterances
):
subtitles
feature alongside the translation
feature.
You’ll have your subtitles in the original language, and also in languages you targeted for the translation!subtitles_config
object supports the following options:
formats
: Array of subtitle formats to generate (options: “srt”, “vtt”)minimum_duration
: Minimum duration of a subtitle in seconds (minimum: 0)maximum_duration
: Maximum duration of a subtitle in seconds (minimum: 1, maximum: 30)maximum_characters_per_row
: Maximum number of characters per row in a subtitle (minimum: 1)maximum_rows_per_caption
: Maximum number of rows per caption (minimum: 1, maximum: 5)style
: Style of the subtitles. Options are:
JSON
response will include a new property subtitles
which is an array of every formats you requested.
With the given example, subtitles
will contains 2 items of shape:
context_prompt
.
custom_vocabulary
feature in the transcription configuration settings.{"value": "string"}
default_intensity
: [optional] The global intensity of the feature (minimum 0, maximum 1, default 0.5).vocabulary.value
: [required] The text used to replace in the transcription.vocabulary.pronunciations
: [optional] The pronunciations used in the transcription language, or vocabulary.language
if present.vocabulary.intensity
: [optional] The intensity of the feature for this particular word (minimum 0, maximum 1, default 0.5).vocabulary.language
: [optional] Specify the language in which it will be pronounced when sound comparison occurs. Default to transcription language.custom_spelling_config
parameter. This dictionary should contain the correct spelling as the key and a list of one or more possible variations as the value.
Custom spelling is useful in scenarios where consistent spelling of specific words is crucial (e.g., technical terms in industry-specific recordings).
name_consistency
parameter. This will ensure the same name is spelled in the same manner throughout the transcript, at the cost of a small amount of added processing time.
This is especially useful for scenarios where people’s names may be mentioned multiple times, but these names are not known in advance
(e.g. recruitment call recordings).
To ensure correct spelling of names which are known in advance, use the custom vocabulary.
channel
key corresponding to the channels the transcription
came from.
custom_metadata
input during your POST request on /v2/pre-recorded
endpoint.
This will allow you to recognize your transcription when you get its data from the GET /v2/pre-recorded/:id
endpoint, but more important, it will allow you to use it as a filter in the GET /v2/pre-recorded
list endpoint.
For example, you can add the following when asking for a transcription:
custom_metadata
cannot be longer than 2000 characters when stringified.