Speaker Diarization

Speaker diarization is the process of detecting multiple speakers in an audio, and understanding which parts of the transcription each speaker said.

Enabling diarization

Diarization is enabled by sending the diarization parameter in the transcription request:

{
  "audio_url": "<your audio URL>",
  "diarization": true
}

Enabling enhanced diarization

For improved diarization handling edge cases and challenging audio, you can enable enhanced diarization by using the enhanced parameter in the diarization_config object.

{
  "audio_url": "<your audio URL>",
  "diarization": true,
  "diarization_config": {
    "enhanced": true
  },
}

Response

When diarization is enabled, each utterance will contain a speaker field, whose value is an index representing the speaker. Speakers will be assigned indexes by order of appearance (i.e. the 1st speaker will be speaker 0, the 2nd speaker 1, etc).

{
  "transcription": {
    "utterances": [
      {
        "words": [...],
        "text": "it says you are trained in technology.",
        "language": "en",
        "start": 0.7334100000000001,
        "end": 2.364,
        "confidence": 0.8914285714285715,
        "channel": 0,
        "speaker": 0,
      },
      ...
    ]
  }
}

Improving diarization accuracy

The following parameters are not yet supported by enhanced diarization.

You can improve the accuracy of the diarization by providing the model with hints regarding the expected number or lower/upper bounds ofspeakers using the diarization_config.num_of_speakers, diarization_config.min_speakers and diarization_config.max_speakers parameters respectively. Important: These parameters are hints, not hard constraints. The actual number of speakers detected by the model may not comply with the provided parameters.

Key	Type	Description
`diarization_config.number_of_speakers`	number	Guiding number of speakers - instructs the model to detect an exact number of speakers in the audio.
`diarization_config.min_speakers`	number	Instructs the model to detect no less than this number of speakers in the audio.
`diarization_config.max_speakers`	number	Causes the model to detect no more than this number of speakers in the audio.

Introduction

Asynchronous Speech-to-Text

Real-time Speech-to-Text

Audio Intelligence

Limits & Specifications

Guides

Integration

Speaker Diarization

Enabling diarization

Enabling enhanced diarization

Response

Improving diarization accuracy

Introduction

Asynchronous Speech-to-Text

Real-time Speech-to-Text

Audio Intelligence

Limits & Specifications

Guides

Integration

​Enabling diarization

​Enabling enhanced diarization

​Response

​Improving diarization accuracy

Enabling diarization

Enabling enhanced diarization

Response

Improving diarization accuracy