curl --request POST \
--url https://api.gladia.io/v2/live \
--header 'Content-Type: application/json' \
--header 'x-gladia-key: <api-key>' \
--data @- <<EOF
{
"encoding": "wav/pcm",
"bit_depth": 16,
"sample_rate": 16000,
"channels": 1,
"custom_metadata": {
"user": "John Doe"
},
"model": "solaria-1",
"endpointing": 0.05,
"maximum_duration_without_endpointing": 5,
"language_config": {
"languages": [],
"code_switching": false
},
"pre_processing": {
"audio_enhancer": false,
"speech_threshold": 0.6
},
"realtime_processing": {
"custom_vocabulary": false,
"custom_vocabulary_config": {
"vocabulary": [
"Westeros",
{
"value": "Stark"
},
{
"value": "Night's Watch",
"pronunciations": [
"Nightz Watch"
],
"intensity": 0.4,
"language": "en"
}
],
"default_intensity": 0.5
},
"custom_spelling": false,
"custom_spelling_config": {
"spelling_dictionary": {
"Gettleman": [
"gettleman"
],
"SQL": [
"Sequel"
]
}
},
"translation": false,
"translation_config": {
"target_languages": [
"en"
],
"model": "base",
"match_original_utterances": true,
"lipsync": true,
"context_adaptation": true,
"context": "<string>",
"informal": false
},
"named_entity_recognition": false,
"sentiment_analysis": false
},
"post_processing": {
"summarization": false,
"summarization_config": {
"type": "general"
},
"chapterization": false
},
"messages_config": {
"receive_partial_transcripts": false,
"receive_final_transcripts": true,
"receive_speech_events": true,
"receive_pre_processing_events": true,
"receive_realtime_processing_events": true,
"receive_post_processing_events": true,
"receive_acknowledgments": true,
"receive_errors": true,
"receive_lifecycle_events": false
},
"callback": false,
"callback_config": {
"url": "https://callback.example",
"receive_partial_transcripts": false,
"receive_final_transcripts": true,
"receive_speech_events": false,
"receive_pre_processing_events": true,
"receive_realtime_processing_events": true,
"receive_post_processing_events": true,
"receive_acknowledgments": false,
"receive_errors": false,
"receive_lifecycle_events": true
}
}
EOF{
"id": "45463597-20b7-4af7-b3b3-f5fb778203ab",
"created_at": "2023-12-28T09:04:17.210Z",
"url": "wss://api.gladia.io/v2/live?token=4a39145c-2844-4557-8f34-34883f7be7d9"
}Initiate a live transcription WebSocket session.
curl --request POST \
--url https://api.gladia.io/v2/live \
--header 'Content-Type: application/json' \
--header 'x-gladia-key: <api-key>' \
--data @- <<EOF
{
"encoding": "wav/pcm",
"bit_depth": 16,
"sample_rate": 16000,
"channels": 1,
"custom_metadata": {
"user": "John Doe"
},
"model": "solaria-1",
"endpointing": 0.05,
"maximum_duration_without_endpointing": 5,
"language_config": {
"languages": [],
"code_switching": false
},
"pre_processing": {
"audio_enhancer": false,
"speech_threshold": 0.6
},
"realtime_processing": {
"custom_vocabulary": false,
"custom_vocabulary_config": {
"vocabulary": [
"Westeros",
{
"value": "Stark"
},
{
"value": "Night's Watch",
"pronunciations": [
"Nightz Watch"
],
"intensity": 0.4,
"language": "en"
}
],
"default_intensity": 0.5
},
"custom_spelling": false,
"custom_spelling_config": {
"spelling_dictionary": {
"Gettleman": [
"gettleman"
],
"SQL": [
"Sequel"
]
}
},
"translation": false,
"translation_config": {
"target_languages": [
"en"
],
"model": "base",
"match_original_utterances": true,
"lipsync": true,
"context_adaptation": true,
"context": "<string>",
"informal": false
},
"named_entity_recognition": false,
"sentiment_analysis": false
},
"post_processing": {
"summarization": false,
"summarization_config": {
"type": "general"
},
"chapterization": false
},
"messages_config": {
"receive_partial_transcripts": false,
"receive_final_transcripts": true,
"receive_speech_events": true,
"receive_pre_processing_events": true,
"receive_realtime_processing_events": true,
"receive_post_processing_events": true,
"receive_acknowledgments": true,
"receive_errors": true,
"receive_lifecycle_events": false
},
"callback": false,
"callback_config": {
"url": "https://callback.example",
"receive_partial_transcripts": false,
"receive_final_transcripts": true,
"receive_speech_events": false,
"receive_pre_processing_events": true,
"receive_realtime_processing_events": true,
"receive_post_processing_events": true,
"receive_acknowledgments": false,
"receive_errors": false,
"receive_lifecycle_events": true
}
}
EOF{
"id": "45463597-20b7-4af7-b3b3-f5fb778203ab",
"created_at": "2023-12-28T09:04:17.210Z",
"url": "wss://api.gladia.io/v2/live?token=4a39145c-2844-4557-8f34-34883f7be7d9"
}id and the GET /v2/live/:id endpoint to obtain the status and results.
Why initiate with POST instead of connecting directly to the WebSocket?
id that you can safely pass to web, iOS, or Android clients without exposing credentials in the app.Your personal Gladia API key
The region used to process the audio.
us-west, eu-west The encoding format of the audio stream. Supported formats:
Note: No need to add WAV headers to raw audio as the API supports both formats.
wav/pcm, wav/alaw, wav/ulaw The bit depth of the audio stream
8, 16, 24, 32 The sample rate of the audio stream
8000, 16000, 32000, 44100, 48000 The number of channels of the audio stream
1 <= x <= 8Custom metadata you can attach to this live transcription
{ "user": "John Doe" }The model used to process the audio. "solaria-1" is used by default.
solaria-1 The endpointing duration in seconds. Endpointing is the duration of silence which will cause an utterance to be considered as finished
0.01 <= x <= 10The maximum duration in seconds without endpointing. If endpointing is not detected after this duration, current utterance will be considered as finished
5 <= x <= 60Specify the language configuration
Show child attributes
If one language is set, it will be used for the transcription. Otherwise, language will be auto-detected by the model.
Specify the language in which it will be pronounced when sound comparison occurs. Default to transcription language.
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zh If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored.
Specify the pre-processing configuration
Show child attributes
If true, apply pre-processing to the audio stream to enhance the quality.
Sensitivity configuration for Speech Threshold. A value close to 1 will apply stricter thresholds, making it less likely to detect background sounds as speech.
0 <= x <= 1Specify the realtime processing configuration
Show child attributes
If true, enable custom vocabulary for the transcription.
Custom vocabulary configuration, if custom_vocabulary is enabled
Show child attributes
Specific vocabulary list to feed the transcription model with. Each item can be a string or an object with the following properties: value, intensity, pronunciations, language.
Show child attributes
The text used to replace in the transcription.
"Gladia"
The global intensity of the feature.
0 <= x <= 10.5
The pronunciations used in the transcription.
Specify the language in which it will be pronounced when sound comparison occurs. Default to transcription language.
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zh "en"
[
"Westeros",
{ "value": "Stark" },
{
"value": "Night's Watch",
"pronunciations": ["Nightz Watch"],
"intensity": 0.4,
"language": "en"
}
]Default intensity for the custom vocabulary
0 <= x <= 10.5
If true, enable custom spelling for the transcription.
Custom spelling configuration, if custom_spelling is enabled
Show child attributes
If true, enable translation for the transcription
Translation configuration, if translation is enabled
Show child attributes
Target language in iso639-1 format you want the transcription translated to
1Target language in iso639-1 format you want the transcription translated to
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, wo, yi, yo, zh ["en"]Model you want the translation model to use to translate
base, enhanced Align translated utterances with the original ones
Whether to apply lipsync to the translated transcription.
Enables or disables context-aware translation features that allow the model to adapt translations based on provided context.
Context information to improve translation accuracy
Forces the translation to use informal language forms when available in the target language.
If true, enable named entity recognition for the transcription.
If true, enable sentiment analysis for the transcription.
Specify the post-processing configuration
Show child attributes
If true, generates summarization for the whole transcription.
If true, generates chapters for the whole transcription.
Specify the websocket messages configuration
Show child attributes
If true, partial transcript will be sent to websocket.
If true, final transcript will be sent to websocket.
If true, begin and end speech events will be sent to websocket.
If true, pre-processing events will be sent to websocket.
If true, realtime processing events will be sent to websocket.
If true, post-processing events will be sent to websocket.
If true, acknowledgments will be sent to websocket.
If true, errors will be sent to websocket.
If true, lifecycle events will be sent to websocket.
If true, messages will be sent to configured url.
Specify the callback configuration
Show child attributes
URL on which we will do a POST request with configured messages
"https://callback.example"
If true, partial transcript will be sent to the defined callback.
If true, final transcript will be sent to the defined callback.
If true, begin and end speech events will be sent to the defined callback.
If true, pre-processing events will be sent to the defined callback.
If true, realtime processing events will be sent to the defined callback.
If true, post-processing events will be sent to the defined callback.
If true, acknowledgments will be sent to the defined callback.
If true, errors will be sent to the defined callback.
If true, lifecycle events will be sent to the defined callback.
The live job has been initiated
Id of the job
"45463597-20b7-4af7-b3b3-f5fb778203ab"
Creation date
"2023-12-28T09:04:17.210Z"
The websocket url to connect to for sending audio data. The url will contain the temporary token to authenticate the session.
"wss://api.gladia.io/v2/live?token=4a39145c-2844-4557-8f34-34883f7be7d9"