All the configuration properties described below are defined in the POST /v2/live endpoint.

Language detection

Spoken language(s)

To get the best results in terms of accuracy and speed, specify the languages that will be spoken in the conversation you want transcribed:

{
  "language_config": {
    "languages": ["en"]
  }
}

Code-switching

If you expect multiple languages to be spoken, enable the code-switching. This will allow for switching between languages without the transcription being affected.

{
  "language_config": {
    "languages": ["en", "fr"],
    "code_switching": true
  }
}

Word-level timestamps

Instead of just getting timestamps for when utterances begin and end, Gladia’s real-time API provides word-level timestamps. This lets you know the exact timestamp for each word, giving you a more precise transcription, facilitating detailed analysis and more accurate synchronization with audio and video files.

To enable it, pass the following configuration:

{
  "realtime_processing": {
    "words_accurate_timestamps": true
  }
}

Under each utterance, you’ll find a words property, like this:

{
  // ... other utterance properties
  "words": [
    {
      "word": "Split",
      "start": 0.21001999999999998,
      "end": 0.69015,
      "confidence": 1
    },
    {
      "word": " infinity",
      "start": 0.91021,
      "end": 1.55038,
      "confidence": 0.95
    },
  ]
}

Custom vocabulary

To enhance the precision of words you know will recur often in your transcription, use the custom_vocabulary feature.
Custom vocabulary has the following limitations:

  • Global limit of 10k characters
  • No more than 100 entries
  • Each element can’t contain more than 5 words
{
  "realtime_processing": {
    "custom_vocabulary": true,
    "custom_vocabulary_config": {
      "vocabulary": ["Westeros", "Stark", "Night's Watch"]
    }
  }
}

Multiple channels

If you have multiple channels in your audio stream, specify the count in the configuration:

{
  "channels": 2
}

Gladia’s real-time API will automatically split the channels and transcribe them separately. For each utterance, you’ll get a channel key corresponding to the channel the utterance came from.

Transcribing an audio stream with multiple channels will be billed exponentially. For example, an audio stream with 2 channels will be billed as double the audio duration, even if the channels are identical.

Attaching custom metadata

You can attach metadata to your real-time transcription session using the custom_metadata property. This’ll make it easy to recognize your transcription when you receive data from the GET /v2/live/:id endpoint. And more importantly, you’ll be able to use it as a filter in the GET /v2/live list endpoint. For example, you can add the following to your configuration:

"custom_metadata": {
    "internalUserId": 2348739875894375,
    "paymentMethod": {
        "last4Digits": 4576
     },
     "internalUserName": "Spencer"
}

And use a GET request to filter results, like this:

https://api.gladia.io/v2/live?custom_metadata={"internalUserId": "2348739875894375"}

or like this:

https://api.gladia.io/v2/live?custom_metadata={"paymentMethod": {"last4Digits": 4576}, "internalUserName": "Spencer"}

custom_metadata cannot be longer than 2000 characters when stringified.