curl --request POST \
--url https://api.gladia.io/v2/pre-recorded \
--header 'Content-Type: application/json' \
--header 'x-gladia-key: <api-key>' \
--data @- <<EOF
{
"audio_url": "http://files.gladia.io/example/audio-transcription/split_infinity.wav",
"language_config": {
"languages": [],
"code_switching": false
},
"custom_vocabulary": false,
"custom_vocabulary_config": {
"vocabulary": [
"Westeros",
{
"value": "Stark"
},
{
"value": "Night's Watch",
"pronunciations": [
"Nightz Watch"
],
"intensity": 0.4,
"language": "en"
}
],
"default_intensity": 0.5
},
"callback": false,
"callback_config": {
"url": "http://callback.example",
"method": "POST"
},
"subtitles": false,
"subtitles_config": {
"formats": [
"srt"
],
"minimum_duration": 1,
"maximum_duration": 15.5,
"maximum_characters_per_row": 2,
"maximum_rows_per_caption": 3,
"style": "default"
},
"diarization": false,
"diarization_config": {
"number_of_speakers": 3,
"min_speakers": 1,
"max_speakers": 2
},
"translation": false,
"translation_config": {
"target_languages": [
"en"
],
"model": "base",
"match_original_utterances": true,
"lipsync": true,
"context_adaptation": true,
"context": "<string>",
"informal": false
},
"summarization": false,
"summarization_config": {
"type": "general"
},
"named_entity_recognition": false,
"custom_spelling": false,
"custom_spelling_config": {
"spelling_dictionary": {
"Gettleman": [
"gettleman"
],
"SQL": [
"Sequel"
]
}
},
"sentiment_analysis": false,
"audio_to_llm": false,
"audio_to_llm_config": {
"prompts": [
"Extract the key points from the transcription"
]
},
"custom_metadata": {
"user": "John Doe"
},
"sentences": false,
"punctuation_enhanced": false
}
EOF{
"id": "45463597-20b7-4af7-b3b3-f5fb778203ab",
"result_url": "https://api.gladia.io/v2/transcription/45463597-20b7-4af7-b3b3-f5fb778203ab"
}Initiate a pre-recorded transcription job. Use the returned id and the GET /v2/pre-recorded/:id endpoint to obtain the results.
curl --request POST \
--url https://api.gladia.io/v2/pre-recorded \
--header 'Content-Type: application/json' \
--header 'x-gladia-key: <api-key>' \
--data @- <<EOF
{
"audio_url": "http://files.gladia.io/example/audio-transcription/split_infinity.wav",
"language_config": {
"languages": [],
"code_switching": false
},
"custom_vocabulary": false,
"custom_vocabulary_config": {
"vocabulary": [
"Westeros",
{
"value": "Stark"
},
{
"value": "Night's Watch",
"pronunciations": [
"Nightz Watch"
],
"intensity": 0.4,
"language": "en"
}
],
"default_intensity": 0.5
},
"callback": false,
"callback_config": {
"url": "http://callback.example",
"method": "POST"
},
"subtitles": false,
"subtitles_config": {
"formats": [
"srt"
],
"minimum_duration": 1,
"maximum_duration": 15.5,
"maximum_characters_per_row": 2,
"maximum_rows_per_caption": 3,
"style": "default"
},
"diarization": false,
"diarization_config": {
"number_of_speakers": 3,
"min_speakers": 1,
"max_speakers": 2
},
"translation": false,
"translation_config": {
"target_languages": [
"en"
],
"model": "base",
"match_original_utterances": true,
"lipsync": true,
"context_adaptation": true,
"context": "<string>",
"informal": false
},
"summarization": false,
"summarization_config": {
"type": "general"
},
"named_entity_recognition": false,
"custom_spelling": false,
"custom_spelling_config": {
"spelling_dictionary": {
"Gettleman": [
"gettleman"
],
"SQL": [
"Sequel"
]
}
},
"sentiment_analysis": false,
"audio_to_llm": false,
"audio_to_llm_config": {
"prompts": [
"Extract the key points from the transcription"
]
},
"custom_metadata": {
"user": "John Doe"
},
"sentences": false,
"punctuation_enhanced": false
}
EOF{
"id": "45463597-20b7-4af7-b3b3-f5fb778203ab",
"result_url": "https://api.gladia.io/v2/transcription/45463597-20b7-4af7-b3b3-f5fb778203ab"
}Your personal Gladia API key
URL to a Gladia file or to an external audio or video file
"http://files.gladia.io/example/audio-transcription/split_infinity.wav"
Specify the language configuration
Show child attributes
If one language is set, it will be used for the transcription. Otherwise, language will be auto-detected by the model.
If one language is set, it will be used for the transcription. Otherwise, language will be auto-detected by the model.
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zh If true, language will be auto-detected on each utterance. Otherwise, language will be auto-detected on first utterance and then used for the rest of the transcription. If one language is set, this option will be ignored.
[Beta] Can be either boolean to enable custom_vocabulary for this audio or an array with specific vocabulary list to feed the transcription model with
[Beta] Custom vocabulary configuration, if custom_vocabulary is enabled
Show child attributes
Specific vocabulary list to feed the transcription model with. Each item can be a string or an object with the following properties: value, intensity, pronunciations, language.
Show child attributes
The text used to replace in the transcription.
"Gladia"
The global intensity of the feature.
0 <= x <= 10.5
The pronunciations used in the transcription.
Specify the language in which it will be pronounced when sound comparison occurs. Default to transcription language.
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, yi, yo, zh "en"
[
"Westeros",
{ "value": "Stark" },
{
"value": "Night's Watch",
"pronunciations": ["Nightz Watch"],
"intensity": 0.4,
"language": "en"
}
]Default intensity for the custom vocabulary
0 <= x <= 10.5
Enable callback for this transcription. If true, the callback_config property will be used to customize the callback behaviour
Customize the callback behaviour (url and http method)
Show child attributes
Enable subtitles generation for this transcription
Configuration for subtitles generation if subtitles is enabled
Show child attributes
Subtitles formats you want your transcription to be formatted to
1Subtitles formats you want your transcription to be formatted to
srt, vtt ["srt"]Minimum duration of a subtitle in seconds
x >= 0Maximum duration of a subtitle in seconds
1 <= x <= 30Maximum number of characters per row in a subtitle
x >= 1Maximum number of rows per caption
1 <= x <= 5Style of the subtitles. Compliance mode refers to : https://loc.gov/preservation/digital/formats//fdd/fdd000569.shtml#:~:text=SRT%20files%20are%20basic%20text,alongside%2C%20example%3A%20%22MyVideo123
default, compliance Enable speaker recognition (diarization) for this audio
Speaker recognition configuration, if diarization is enabled
Show child attributes
Exact number of speakers in the audio
x >= 13
Minimum number of speakers in the audio
x >= 01
Maximum number of speakers in the audio
x >= 02
[Beta] Enable translation for this audio
[Beta] Translation configuration, if translation is enabled
Show child attributes
Target language in iso639-1 format you want the transcription translated to
1Target language in iso639-1 format you want the transcription translated to
af, am, ar, as, az, ba, be, bg, bn, bo, br, bs, ca, cs, cy, da, de, el, en, es, et, eu, fa, fi, fo, fr, gl, gu, ha, haw, he, hi, hr, ht, hu, hy, id, is, it, ja, jw, ka, kk, km, kn, ko, la, lb, ln, lo, lt, lv, mg, mi, mk, ml, mn, mr, ms, mt, my, ne, nl, nn, no, oc, pa, pl, ps, pt, ro, ru, sa, sd, si, sk, sl, sn, so, sq, sr, su, sv, sw, ta, te, tg, th, tk, tl, tr, tt, uk, ur, uz, vi, wo, yi, yo, zh ["en"]Model you want the translation model to use to translate
base, enhanced Align translated utterances with the original ones
Whether to apply lipsync to the translated transcription.
Enables or disables context-aware translation features that allow the model to adapt translations based on provided context.
Context information to improve translation accuracy
Forces the translation to use informal language forms when available in the target language.
[Beta] Enable summarization for this audio
[Alpha] Enable named entity recognition for this audio
[Alpha] Enable custom spelling for this audio
[Alpha] Custom spelling configuration, if custom_spelling is enabled
Enable sentiment analysis for this audio
[Alpha] Enable audio to llm processing for this audio
Custom metadata you can attach to this transcription
{ "user": "John Doe" }Enable sentences for this audio
[Alpha] Use enhanced punctuation for this audio
The pre recorded job has been initiated