Core features of Gladia’s real-time speech-to-text (STT) API
language_config.languages
parameter to ensure the best transcription results.
language_config.languages
parameter; the model will automatically detect the language from the audio across all supported languages.language_config.languages
parameter; the model will detect the language from the audio within the provided options.language_config.code_switching
parameter. This will allow the model to switch languages dynamically and reflect it in the transcription results.
As with single-language configuration, you can either let the model detect the language from all supported languages or specify a set of options to narrow down the selection.
words
property, like this:
custom_vocabulary
feature.{"value": "string"}
default_intensity
: [optional] The global intensity of the feature (minimum 0, maximum 1, default 0.5).vocabulary.value
: [required] The text used to replace in the transcription.vocabulary.pronunciations
: [optional] The pronunciations used in the transcription language, or vocabulary.language
if present.vocabulary.intensity
: [optional] The intensity of the feature for this particular word (minimum 0, maximum 1, default 0.5).vocabulary.language
: [optional] Specify the language in which it will be pronounced when sound comparison occurs. Default to transcription language.custom_spelling_config
parameter. This dictionary should contain the correct spelling as the key and a list of one or more possible variations as the value.
Custom spelling is useful in scenarios where consistent spelling of specific words is crucial (e.g., technical terms in industry-specific recordings).
target_languages
: Array of language codes for translation outputs (e.g., ["fr", "es"]
)model
: Translation model - "base"
(fast) or "enhanced"
(higher quality, context-aware)context_adaptation
: Boolean to enable/disable context-aware translation features (default: true
)context
: String providing context to improve translation accuracy (default: ""
)informal
: Boolean to force informal language forms when available (default: false
)translation
message for each transcript
message and target language.
channel
key corresponding to the channel the utterance came from.
custom_metadata
property. This’ll make it easy to recognize your transcription when you receive data from the GET /v2/live/:id
endpoint. And more importantly, you’ll be able to use it as a filter in the GET /v2/live
list endpoint.
For example, you can add the following to your configuration:
custom_metadata
cannot be longer than 2000 characters when stringified.