curl --request POST \
--url https://api.gladia.io/v2/transcription \
--header 'Content-Type: application/json' \
--header 'x-gladia-key: <api-key>' \
--data @- <<EOF
{
"audio_url": "http://files.gladia.io/example/audio-transcription/split_infinity.wav",
"context_prompt": "<string>",
"custom_vocabulary": false,
"custom_vocabulary_config": {
"vocabulary": [
"Westeros",
{
"value": "Stark"
},
{
"value": "Night's Watch",
"pronunciations": [
"Nightz Watch"
],
"intensity": 0.4,
"language": "en"
}
],
"default_intensity": 0.5
},
"detect_language": true,
"enable_code_switching": false,
"code_switching_config": {
"languages": []
},
"language": "en",
"callback_url": "http://callback.example",
"callback": false,
"callback_config": {
"url": "http://callback.example",
"method": "POST"
},
"subtitles": false,
"subtitles_config": {
"formats": [
"srt"
],
"minimum_duration": 1,
"maximum_duration": 15.5,
"maximum_characters_per_row": 2,
"maximum_rows_per_caption": 3,
"style": "default"
},
"diarization": false,
"diarization_config": {
"number_of_speakers": 3,
"min_speakers": 1,
"max_speakers": 2
},
"translation": false,
"translation_config": {
"target_languages": [
"en"
],
"model": "base",
"match_original_utterances": true,
"lipsync": true,
"context_adaptation": true,
"context": "<string>",
"informal": false
},
"summarization": false,
"summarization_config": {
"type": "general"
},
"moderation": false,
"named_entity_recognition": false,
"chapterization": false,
"name_consistency": false,
"custom_spelling": false,
"custom_spelling_config": {
"spelling_dictionary": {
"Gettleman": [
"gettleman"
],
"SQL": [
"Sequel"
]
}
},
"structured_data_extraction": false,
"structured_data_extraction_config": {
"classes": [
"Persons",
"Organizations"
]
},
"sentiment_analysis": false,
"audio_to_llm": false,
"audio_to_llm_config": {
"prompts": [
"Extract the key points from the transcription"
]
},
"custom_metadata": {
"user": "John Doe"
},
"sentences": false,
"display_mode": false,
"punctuation_enhanced": false,
"language_config": {
"languages": [],
"code_switching": false
}
}
EOF{
"id": "45463597-20b7-4af7-b3b3-f5fb778203ab",
"result_url": "https://api.gladia.io/v2/transcription/45463597-20b7-4af7-b3b3-f5fb778203ab"
}(Deprecated) Prefer the more specific pre-recorded endpoint.
Initiate a pre-recorded transcription job. Use the returned id and the GET /v2/transcription/:id endpoint to obtain the results.
curl --request POST \
--url https://api.gladia.io/v2/transcription \
--header 'Content-Type: application/json' \
--header 'x-gladia-key: <api-key>' \
--data @- <<EOF
{
"audio_url": "http://files.gladia.io/example/audio-transcription/split_infinity.wav",
"context_prompt": "<string>",
"custom_vocabulary": false,
"custom_vocabulary_config": {
"vocabulary": [
"Westeros",
{
"value": "Stark"
},
{
"value": "Night's Watch",
"pronunciations": [
"Nightz Watch"
],
"intensity": 0.4,
"language": "en"
}
],
"default_intensity": 0.5
},
"detect_language": true,
"enable_code_switching": false,
"code_switching_config": {
"languages": []
},
"language": "en",
"callback_url": "http://callback.example",
"callback": false,
"callback_config": {
"url": "http://callback.example",
"method": "POST"
},
"subtitles": false,
"subtitles_config": {
"formats": [
"srt"
],
"minimum_duration": 1,
"maximum_duration": 15.5,
"maximum_characters_per_row": 2,
"maximum_rows_per_caption": 3,
"style": "default"
},
"diarization": false,
"diarization_config": {
"number_of_speakers": 3,
"min_speakers": 1,
"max_speakers": 2
},
"translation": false,
"translation_config": {
"target_languages": [
"en"
],
"model": "base",
"match_original_utterances": true,
"lipsync": true,
"context_adaptation": true,
"context": "<string>",
"informal": false
},
"summarization": false,
"summarization_config": {
"type": "general"
},
"moderation": false,
"named_entity_recognition": false,
"chapterization": false,
"name_consistency": false,
"custom_spelling": false,
"custom_spelling_config": {
"spelling_dictionary": {
"Gettleman": [
"gettleman"
],
"SQL": [
"Sequel"
]
}
},
"structured_data_extraction": false,
"structured_data_extraction_config": {
"classes": [
"Persons",
"Organizations"
]
},
"sentiment_analysis": false,
"audio_to_llm": false,
"audio_to_llm_config": {
"prompts": [
"Extract the key points from the transcription"
]
},
"custom_metadata": {
"user": "John Doe"
},
"sentences": false,
"display_mode": false,
"punctuation_enhanced": false,
"language_config": {
"languages": [],
"code_switching": false
}
}
EOF{
"id": "45463597-20b7-4af7-b3b3-f5fb778203ab",
"result_url": "https://api.gladia.io/v2/transcription/45463597-20b7-4af7-b3b3-f5fb778203ab"
}Your personal Gladia API key
URL to a Gladia file or to an external audio or video file
"http://files.gladia.io/example/audio-transcription/split_infinity.wav"
[Deprecated] Context to feed the transcription model with for possible better accuracy
[Beta] Can be either boolean to enable custom_vocabulary for this audio or an array with specific vocabulary list to feed the transcription model with
[Beta] Custom vocabulary configuration, if custom_vocabulary is enabled
Show child attributes
Specific vocabulary list to feed the transcription model with. Each item can be a string or an object with the following properties: value, intensity, pronunciations, language.
Show child attributes
The text used to replace in the transcription.
"Gladia"
The global intensity of the feature.
0 <= x <= 10.5
The pronunciations used in the transcription.
Specify the language in which it will be pronounced when sound comparison occurs. Default to transcription language.
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zh "en"
[
"Westeros",
{ "value": "Stark" },
{
"value": "Night's Watch",
"pronunciations": ["Nightz Watch"],
"intensity": 0.4,
"language": "en"
}
]Default intensity for the custom vocabulary
0 <= x <= 10.5
[Deprecated] Use language_config instead. Detect the language from the given audio
[Deprecated] Use language_config instead.Detect multiple languages in the given audio
[Deprecated] Use language_config instead. Specify the configuration for code switching
Show child attributes
Specify the languages you want to use when detecting multiple languages
Specify the language in which it will be pronounced when sound comparison occurs. Default to transcription language.
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zh [Deprecated] Use language_config instead. Set the spoken language for the given audio (ISO 639 standard)
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zh "en"
[Deprecated] Use callback/callback_config instead. Callback URL we will do a POST request to with the result of the transcription
"http://callback.example"
Enable callback for this transcription. If true, the callback_config property will be used to customize the callback behaviour
Customize the callback behaviour (url and http method)
Show child attributes
Enable subtitles generation for this transcription
Configuration for subtitles generation if subtitles is enabled
Show child attributes
Subtitles formats you want your transcription to be formatted to
1Subtitles formats you want your transcription to be formatted to
srt, vtt ["srt"]Minimum duration of a subtitle in seconds
x >= 0Maximum duration of a subtitle in seconds
1 <= x <= 30Maximum number of characters per row in a subtitle
x >= 1Maximum number of rows per caption
1 <= x <= 5Style of the subtitles. Compliance mode refers to : https://loc.gov/preservation/digital/formats//fdd/fdd000569.shtml#:~:text=SRT%20files%20are%20basic%20text,alongside%2C%20example%3A%20%22MyVideo123
default, compliance Enable speaker recognition (diarization) for this audio
Speaker recognition configuration, if diarization is enabled
Show child attributes
Exact number of speakers in the audio
x >= 13
Minimum number of speakers in the audio
x >= 01
Maximum number of speakers in the audio
x >= 02
[Beta] Enable translation for this audio
[Beta] Translation configuration, if translation is enabled
Show child attributes
Target language in iso639-1 format you want the transcription translated to
1Target language in iso639-1 format you want the transcription translated to
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, wo, yi, yo, zh ["en"]Model you want the translation model to use to translate
base, enhanced Align translated utterances with the original ones
Whether to apply lipsync to the translated transcription.
Enables or disables context-aware translation features that allow the model to adapt translations based on provided context.
Context information to improve translation accuracy
Forces the translation to use informal language forms when available in the target language.
[Beta] Enable summarization for this audio
[Alpha] Enable moderation for this audio
[Alpha] Enable named entity recognition for this audio
[Alpha] Enable chapterization for this audio
[Alpha] Enable names consistency for this audio
[Alpha] Enable custom spelling for this audio
[Alpha] Custom spelling configuration, if custom_spelling is enabled
[Alpha] Enable structured data extraction for this audio
[Alpha] Structured data extraction configuration, if structured_data_extraction is enabled
Show child attributes
The list of classes to extract from the audio transcription
1["Persons", "Organizations"]Enable sentiment analysis for this audio
[Alpha] Enable audio to llm processing for this audio
Custom metadata you can attach to this transcription
{ "user": "John Doe" }Enable sentences for this audio
[Alpha] Allows to change the output display_mode for this audio. The output will be reordered, creating new utterances when speakers overlapped
[Alpha] Use enhanced punctuation for this audio
Specify the language configuration
Show child attributes
If one language is set, it will be used for the transcription. Otherwise, language will be auto-detected by the model.
Specify the language in which it will be pronounced when sound comparison occurs. Default to transcription language.
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zh If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored.
The transcription job has been initiated