All the configuration properties described below are defined in the POST /v2/live endpoint.

Language configuration

Single language

If you know the language of the conversation in advance, specify it in the language_config.languages parameter to ensure the best transcription results.

{
  "language_config": {
    "languages": ["en"]
  }
}

If the spoken language is unknown, you can:

  • Omit the language_config.languages parameter; the model will automatically detect the language from the first few seconds of audio across all supported languages.
  • Specify multiple languages in the language_config.languages parameter; the model will detect the language from the first few seconds of audio within the provided options.

Multiple languages
(Code-switching)

If you expect multiple languages to be spoken during the conversation, enable the language_config.code_switching parameter. This will allow the model to switch languages dynamically and reflect it in the transcription results.

As with single-language configuration, you can either let the model detect the language from all supported languages or specify a set of options to narrow down the selection.


It is recommended to limit the number of languages to avoid incorrect detection, either in single or multiple languages configuration. Some languages, such as those from Eastern European countries, have similar sounds, which may cause the model to confuse them and produce a transcription in the wrong language.

Word-level timestamps

Instead of just getting timestamps for when utterances begin and end, Gladia’s real-time API provides word-level timestamps. This lets you know the exact timestamp for each word, giving you a more precise transcription, facilitating detailed analysis and more accurate synchronization with audio and video files.

To enable it, pass the following configuration:

{
  "realtime_processing": {
    "words_accurate_timestamps": true
  }
}

Under each utterance, you’ll find a words property, like this:

{
  // ... other utterance properties
  "words": [
    {
      "word": "Split",
      "start": 0.21001999999999998,
      "end": 0.69015,
      "confidence": 1
    },
    {
      "word": " infinity",
      "start": 0.91021,
      "end": 1.55038,
      "confidence": 0.95
    },
  ]
}

Custom vocabulary

To enhance the precision of words you know will recur often in your transcription, use the custom_vocabulary feature.
Custom vocabulary has the following limitations:

  • Global limit of 10k characters
  • No more than 100 entries
  • Each element can’t contain more than 5 words
{
  "realtime_processing": {
    "custom_vocabulary": true,
    "custom_vocabulary_config": {
      "vocabulary": ["Westeros", "Stark", "Night's Watch"]
    }
  }
}

Multiple channels

If you have multiple channels in your audio stream, specify the count in the configuration:

{
  "channels": 2
}

Gladia’s real-time API will automatically split the channels and transcribe them separately. For each utterance, you’ll get a channel key corresponding to the channel the utterance came from.

Transcribing an audio stream with multiple channels will be billed exponentially. For example, an audio stream with 2 channels will be billed as double the audio duration, even if the channels are identical.

Attaching custom metadata

You can attach metadata to your real-time transcription session using the custom_metadata property. This’ll make it easy to recognize your transcription when you receive data from the GET /v2/live/:id endpoint. And more importantly, you’ll be able to use it as a filter in the GET /v2/live list endpoint. For example, you can add the following to your configuration:

"custom_metadata": {
    "internalUserId": 2348739875894375,
    "paymentMethod": {
        "last4Digits": 4576
     },
     "internalUserName": "Spencer"
}

And use a GET request to filter results, like this:

https://api.gladia.io/v2/live?custom_metadata={"internalUserId": "2348739875894375"}

or like this:

https://api.gladia.io/v2/live?custom_metadata={"paymentMethod": {"last4Digits": 4576}, "internalUserName": "Spencer"}

custom_metadata cannot be longer than 2000 characters when stringified.