Features
Core features of the Gladia Real-time Speech to Text (STT) API
Language detection
Spoken language(s)
To obtain the best result (in terms of accuracy and speed), it’s important to narrow the list of languages:
{
"language_config": {
"languages": ["en"]
}
}
Code switching
If you expect multiple languages to be spoken during the Real-time session, enable the code switching option:
{
"language_config": {
"languages": ["en", "fr"],
"code_switching": true
}
}
Word-level timestamps
Instead of just getting utterances start and end timestamps, Gladia Live Speech-To-Text API provides the Word-level timestamps feature. It lets you know the exact timestamp for each word and give you a more precise transcription. This feature is particularly useful for detailed analysis, as it allows you to pinpoint the exact moment each word is spoken, facilitating a more accurate synchronization with audio or video files.
To enable it, pass the following configuration:
{
"realtime_processing": {
"words_accurate_timestamps": true
}
}
Under each utterance, you’ll find a words
property like this:
{
// ... other utterance properties
"words": [
{
"word": "Split",
"start": 0.21001999999999998,
"end": 0.69015,
"confidence": 1
},
{
"word": " infinity",
"start": 0.91021,
"end": 1.55038,
"confidence": 0.95
},
]
}
Custom vocabulary
To enhance the precision of transcription, especially for words or phrases that recur often in your audio stream, you
can utilize the custom_vocabulary
feature in the configuration.
The custom vocabulary has the following limitations:
- global limit of 10k characters
- no more than 100 elements
- each element should not contain more than 5 words
{
"realtime_processing": {
"custom_vocabulary": true,
"custom_vocabulary_config": {
"vocabulary": ["Westeros", "Stark", "Night's Watch"]
}
}
}
Multiple channels audio stream
If you have multiple channels in your audio stream, specify the count in the configuration:
{
"channels": 2
}
Gladia Live STT API will automatically split them and transcribe them separately.
For each utterance, you will get a channel
key corresponding to the channel the transcription came from.
Sending an audio with 2 channels will be billed twice the audio duration even if channels are identical.
Attaching custom metadata
You can attach metadata to your live transcription session using the custom_metadata
property.
This will allow you to recognize your transcription when you get its data from the GET /v2/live/:id
endpoint, but more important, it will allow you to use it as a filter in the GET /v2/live
list endpoint.
For example, you can add the following to your configuration:
"custom_metadata": {
"internalUserId": 2348739875894375,
"paymentMethod": {
"last4Digits": 4576
},
"internalUserName": "Spencer"
}
And then, use the following GET request to filter results like:
https://api.gladia.io/v2/live?custom_metadata={"internalUserId": "2348739875894375"}
or
https://api.gladia.io/v2/live?custom_metadata={"paymentMethod": {"last4Digits": 4576}, "internalUserName": "Spencer"}
custom_metadata
cannot be longer than 2000 characters when stringified.
Was this page helpful?