API Documentation
Upload
Pre-recorded endpoints
Live endpoints
Live WS actions
Live WS messages
- Realtime messages
- Post-processing messages
- Acknowledgment messages
- Lifecycle messages
Live callbacks
- Realtime messages
- Post-processing messages
- Acknowledgment messages
- Lifecycle messages
Live webhooks
- Lifecycle messages
List transcriptions
List all the live transcriptions matching the parameters.
Authorizations
Your personal Gladia API key
Query Parameters
The starting point for pagination. A value of 0 starts from the first item.
x > 0
The maximum number of items to return. Useful for pagination and controlling data payload size.
x > 1
Filter items relevant to a specific date in ISO format (YYYY-MM-DD).
Include items that occurred before the specified date in ISO format.
Filter for items after the specified date. Use with before_date
for a range. Date in ISO format.
Filter the list based on item status. Accepts multiple values from the predefined list.
queued
, processing
, done
, error
Response
URL to fetch the first page
URL to fetch the current page
URL to fetch the next page
List of live transcriptions
Id of the job
Debug id
API version
"queued": the job has been queued. "processing": the job is being processed. "done": the job has been processed and the result is available. "error": an error occurred during the job's processing.
queued
, processing
, done
, error
Creation date
live
Completion date when status is "done" or "error"
Custom metadata given in the initial request
HTTP status code of the error if status is "error"
400 < x < 599
The file data you uploaded. Can be null if status is "error"
The file id
The name of the uploaded file
The link used to download the file if audio_url was used
Duration of the audio file
Number of channels in the audio file
x > 1
Parameters used for this live transcription. Can be null if status is "error"
The encoding format of the audio stream. Supported formats:
- PCM: 8, 16, 24, and 32 bits
- A-law: 8 bits
- μ-law: 8 bits
Note: No need to add WAV headers to raw audio as the API supports both formats.
wav/pcm
, wav/alaw
, wav/ulaw
The bit depth of the audio stream
8
, 16
, 24
, 32
The sample rate of the audio stream
8000
, 16000
, 32000
, 44100
, 48000
The number of channels of the audio stream
1 < x < 8
The model used to process the audio. "accurate" is used by default.
fast
, accurate
The endpointing duration in seconds. Endpointing is the duration of silence which will cause an utterance to be considered as finished
0.01 < x < 10
The maximum duration in seconds without endpointing. If endpointing is not detected after this duration, current utterance will be considered as finished
5 < x < 60
Specify the language configuration
If one language is set, it will be used for the transcription. Otherwise, language will be auto-detected by the model.
af
, sq
, am
, ar
, hy
, as
, az
, ba
, eu
, be
, bn
, bs
, br
, bg
, ca
, zh
, hr
, cs
, da
, nl
, en
, et
, fo
, fi
, fr
, gl
, ka
, de
, el
, gu
, ht
, ha
, haw
, he
, hi
, hu
, is
, id
, it
, ja
, jv
, kn
, kk
, km
, ko
, lo
, la
, lv
, ln
, lt
, lb
, mk
, mg
, ms
, ml
, mt
, mi
, mr
, mn
, mymr
, ne
, no
, nn
, oc
, ps
, fa
, pl
, pt
, pa
, ro
, ru
, sa
, sr
, sn
, sd
, si
, sk
, sl
, so
, es
, su
, sw
, sv
, tl
, tg
, ta
, tt
, te
, th
, bo
, tr
, tk
, uk
, ur
, uz
, vi
, cy
, yi
, yo
, jp
If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored.
Specify the pre-processing configuration
If true, apply pre-processing to the audio stream to enhance the quality.
Sensitivity configuration for Speech Threshold. A value close to 1 will apply stricter thresholds, making it less likely to detect background sounds as speech.
0 < x < 1
Specify the realtime processing configuration
If true, accurate timestamps will be provided for each word in the transcription.
If true, enable custom vocabulary for the transcription.
Custom vocabulary configuration, if custom_vocabulary
is enabled
Specific vocabulary list to feed the transcription model with
Default intensity for the custom vocabulary
0 < x < 1
If true, enable named entity recognition for the transcription.
If true, enable sentiment analysis for the transcription.
Specify the post-processing configuration
If true, generates summarization for the whole transcription.
Summarization configuration, if summarization
is enabled
The type of summarization to apply
general
, bullet_points
, concise
If true, generates chapters for the whole transcription.
Specify the websocket messages configuration
If true, partial utterance will be sent to websocket.
If true, final utterance will be sent to websocket.
If true, begin and end speech events will be sent to websocket.
If true, pre-processing events will be sent to websocket.
If true, realtime processing events will be sent to websocket.
If true, post-processing events will be sent to websocket.
If true, acknowledgments will be sent to websocket.
If true, errors will be sent to websocket.
If true, lifecycle events will be sent to websocket.
If true, messages will be sent to configured url.
Specify the callback configuration
URL on which we will do a POST
request with configured messages
If true, partial utterance will be sent to the defined callback.
If true, final utterance will be sent to the defined callback.
If true, begin and end speech events will be sent to the defined callback.
If true, pre-processing events will be sent to the defined callback.
If true, realtime processing events will be sent to the defined callback.
If true, post-processing events will be sent to the defined callback.
If true, acknowledgments will be sent to the defined callback.
If true, errors will be sent to the defined callback.
If true, lifecycle events will be sent to the defined callback.
Live transcription's result when status is "done"
Metadata for the given transcription & audio file
Duration of the transcribed audio file
Number of distinct channels in the transcribed audio file
x > 1
Billed duration in seconds (audio_duration * number_of_distinct_channels)
Duration of the transcription in seconds
Transcription of the audio speech
All transcription on text format without any other information
af
, sq
, am
, ar
, hy
, as
, ast
, az
, ba
, eu
, be
, bn
, bs
, br
, bg
, my
, ca
, ceb
, zh
, hr
, cs
, da
, nl
, en
, et
, fo
, fi
, fr
, fy
, ff
, gd
, gl
, lg
, ka
, de
, el
, gu
, ht
, ha
, haw
, he
, hi
, hu
, is
, ig
, ilo
, id
, ga
, it
, ja
, jv
, kn
, kk
, km
, ko
, lo
, la
, lv
, ln
, lt
, lb
, mk
, mg
, ms
, ml
, mt
, mi
, mr
, mo
, mn
, mymr
, ne
, no
, nn
, oc
, or
, pa
, ps
, fa
, pl
, pt
, pa
, ro
, ru
, sa
, sr
, sn
, sd
, si
, sk
, sl
, so
, es
, su
, sw
, ss
, sv
, tl
, tg
, ta
, tt
, te
, th
, bo
, tn
, tr
, tk
, uk
, ur
, uz
, vi
, cy
, wo
, xh
, yi
, yo
, zu
Transcribed speech utterances present in the audio
All the detected languages in the audio sorted from the most detected to the less detected
af
, sq
, am
, ar
, hy
, as
, ast
, az
, ba
, eu
, be
, bn
, bs
, br
, bg
, my
, ca
, ceb
, zh
, hr
, cs
, da
, nl
, en
, et
, fo
, fi
, fr
, fy
, ff
, gd
, gl
, lg
, ka
, de
, el
, gu
, ht
, ha
, haw
, he
, hi
, hu
, is
, ig
, ilo
, id
, ga
, it
, ja
, jv
, kn
, kk
, km
, ko
, lo
, la
, lv
, ln
, lt
, lb
, mk
, mg
, ms
, ml
, mt
, mi
, mr
, mo
, mn
, mymr
, ne
, no
, nn
, oc
, or
, pa
, ps
, fa
, pl
, pt
, pa
, ro
, ru
, sa
, sr
, sn
, sd
, si
, sk
, sl
, so
, es
, su
, sw
, ss
, sv
, tl
, tg
, ta
, tt
, te
, th
, bo
, tn
, tr
, tk
, uk
, ur
, uz
, vi
, cy
, wo
, xh
, yi
, yo
, zu
Start timestamp in seconds of this utterance
End timestamp in seconds of this utterance
Confidence on the transcribed utterance (1 = 100% confident)
Audio channel of where this utterance has been transcribed from
x > 0
List of words of the utterance, split by timestamp
Transcription for this utterance
If diarization
enabled, speaker identification number
x > 0
If sentences
has been enabled, sentences results
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
If sentences
has been enabled, transcription as sentences.
If subtitles
has been enabled, subtitles results
If translation
has been enabled, translation of the audio speech transcription
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
List of translated transcriptions, one for each target_languages
Contains the error details of the failed addon
All transcription on text format without any other information
af
, sq
, am
, ar
, hy
, as
, ast
, az
, ba
, eu
, be
, bn
, bs
, br
, bg
, my
, ca
, ceb
, zh
, hr
, cs
, da
, nl
, en
, et
, fo
, fi
, fr
, fy
, ff
, gd
, gl
, lg
, ka
, de
, el
, gu
, ht
, ha
, haw
, he
, hi
, hu
, is
, ig
, ilo
, id
, ga
, it
, ja
, jv
, kn
, kk
, km
, ko
, lo
, la
, lv
, ln
, lt
, lb
, mk
, mg
, ms
, ml
, mt
, mi
, mr
, mo
, mn
, mymr
, ne
, no
, nn
, oc
, or
, pa
, ps
, fa
, pl
, pt
, pa
, ro
, ru
, sa
, sr
, sn
, sd
, si
, sk
, sl
, so
, es
, su
, sw
, ss
, sv
, tl
, tg
, ta
, tt
, te
, th
, bo
, tn
, tr
, tk
, uk
, ur
, uz
, vi
, cy
, wo
, xh
, yi
, yo
, zu
Transcribed speech utterances present in the audio
If sentences
has been enabled, sentences results for this translation
If subtitles
has been enabled, subtitles results for this translation
If summarization
has been enabled, summarization of the audio speech transcription
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
If summarization
has been enabled, summary of the transcription
If moderation
has been enabled, moderation of the audio speech transcription
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
If moderation
has been enabled, moderated transcription
If named_entity_recognition
has been enabled, the detected entities
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
If named_entity_recognition
has been enabled, the detected entities.
If name_consistency
has been enabled, Gladia will improve consistency of the names accross the transcription
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
If name_consistency
has been enabled, Gladia will improve the consistency of the names across the transcription
If custom_spelling
has been enabled, Gladia will correct the spelling of the transcription
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
If custom_spelling
has been enabled, Gladia will correct the spelling of the transcription
If speaker_reidentification
has been enabled, results of the AI speaker reidentification.
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
If speaker_reidentification
has been enabled, results of the AI speaker reidentification.
If structured_data_extraction
has been enabled, structured data extraction results
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
Status code of the addon error
Reason of the addon error
Detailed message of the addon error
If structured_data_extraction
has been enabled, results of the AI structured data extraction for the defined classes.
If sentiment_analysis
has been enabled, sentiment analysis of the audio speech transcription
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
If sentiment_analysis
has been enabled, Gladia will analyze the sentiments and emotions of the audio
If audio_to_llm
has been enabled, audio to llm results of the audio speech transcription
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
If audio_to_llm
has been enabled, results of the AI custom analysis
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
The result from a specific prompt
If sentences
has been enabled, sentences of the audio speech transcription. Deprecated: content will move to the transcription
object.
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
If sentences
has been enabled, transcription as sentences.
If display_mode
has been enabled, the output will be reordered, creating new utterances when speakers overlapped
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
If display_mode
has been enabled, proposes an alternative display output.
If chapterization
has been enabled, will generate chapters name for different parts of the given audio.
The audio intelligence model succeeded to get a valid output
The audio intelligence model returned an empty value
Time audio intelligence model took to complete the task
null
if success
is true
. Contains the error details of the failed model
If chapterization
has been enabled, will generate chapters name for different parts of the given audio.
Was this page helpful?