Features
Core features of Gladia’s real-time speech-to-text (STT) API
Language detection
Spoken language(s)
To get the best results in terms of accuracy and speed, specify the languages that will be spoken in the conversation you want transcribed:
Code-switching
If you expect multiple languages to be spoken, enable the code-switching. This will allow for switching between languages without the transcription being affected.
Word-level timestamps
Instead of just getting timestamps for when utterances begin and end, Gladia’s real-time API provides word-level timestamps. This lets you know the exact timestamp for each word, giving you a more precise transcription, facilitating detailed analysis and more accurate synchronization with audio and video files.
To enable it, pass the following configuration:
Under each utterance, you’ll find a words
property, like this:
Custom vocabulary
To enhance the precision of words you know will recur often in your transcription, use the custom_vocabulary
feature.
Custom vocabulary has the following limitations:
- Global limit of 10k characters
- No more than 100 entries
- Each element can’t contain more than 5 words
Multiple channels
If you have multiple channels in your audio stream, specify the count in the configuration:
Gladia’s real-time API will automatically split the channels and transcribe them separately.
For each utterance, you’ll get a channel
key corresponding to the channel the utterance came from.
Transcribing an audio stream with multiple channels will be billed exponentially. For example, an audio stream with 2 channels will be billed as double the audio duration, even if the channels are identical.
Attaching custom metadata
You can attach metadata to your real-time transcription session using the custom_metadata
property. This’ll make it easy to recognize your transcription when you receive data from the GET /v2/live/:id
endpoint. And more importantly, you’ll be able to use it as a filter in the GET /v2/live
list endpoint.
For example, you can add the following to your configuration:
And use a GET request to filter results, like this:
or like this:
custom_metadata
cannot be longer than 2000 characters when stringified.
Was this page helpful?