This API provides an interface for converting spoken language into written text, which is an essential capability for applications such as voice assistants, subtitling services, note-taking apps, and many more. Here, we will provide a comprehensive guide on the parameters that can be used with this API endpoint to customize and optimize the transcription process according to your needs.
The Audio Transcription API Endpoint accepts audio files and transcribes them into text by leveraging sophisticated speech recognition algorithms. It is crucial to understand the parameters that can be used when making a request to this endpoint, as they dictate how the API processes the audio input.
Sending Audio for Transcription
Audio transcription endpoint
The post endpoint for Gladia Audio Transcription API is:
<https://api.gladia.io/audio/text/audio-transcription/>
Body Params
audio
(optional if audio_url filled, file): The file to transcribeaudio_url
(optional if audio filled, string): - - The file url to transcript, ignored if an audio file is provided. This can be a public file url or any of the supported social platform listed in the documentation.`language_behaviour
(optional, enum): one the the following [manual, automatic single language, automatic multiple languages] - Define how the speaker's language will be detected. Default value is 'automatic single language'.language
(optional, string): e.g. “english” - If language_behaviour is set to manual, define the language to use for the transcriptiontoggle_noise_reduction
(optional, boolean): “true”/”false” - Activate the noise reduction to improve transcription qualitytranscription_hint
(optional, string): Provide a custom vocabulary to the model to improve accuracy of transcribing context specific words, technical terms, names, etc. If empty, this argument is skipped.toggle_diarization
(optional, boolean): “true”/”false” - Activate the diarization of the audiotoggle_direct_translate
[Beta] (optional, boolean): “true”/”false” - Activate the direct translation of the audio transcription. Translation is currently in beta.target_translation_language
[Beta] (optional, string): e.g. “english” - If toogle_direct_translate is set to true, define the language to use for the translation of the transcriptionwebhook_url
(optional, url): e.g. “https://webhook.site/df092e71-24c0-4fa7-a672-c6913fa164ec” - Webhook URL to send the result to. Make sure it's network is open.diarization_num_speakers
(optional, integer): Guiding number of speakers - forces the model to detect an exact number of speakers in the audiodiarization_min_speakers
(optional, integer): Guiding minimum number of speakers - forces the model to detect no less than number of speakers in the audio.diarization_max_speakers
(optional, integer): Guiding maximum number of speakers - forces the model to detect no more than number of speakers in the audio.output_format
(optional, enum): Define the output format, allowing to have json, srt, vtt, txt (subtitle file format for video content)
Sending Video for Transcription
Video transcription endpoint
The post endpoint for Gladia Audio Transcription API is:
<https://api.gladia.io/audio/text/video-transcription/>
Body Params
video
(optional if video_url filled, file): The file to transcribevideo_url
(optional if video filled, string): - - The file url to transcript, ignored if a video file is provided. This can be a public file url or any of the supported social platform listed in the documentation.`language_behaviour
(optional, enum): one the the following [manual, automatic single language, automatic multiple languages] - Define how the speaker's language will be detected. Default value is 'automatic single language'.language
(optional, string): e.g. “english” - If language_behaviour is set to manual, define the language to use for the transcriptiontoggle_noise_reduction
(optional, boolean): “true”/”false” - Activate the noise reduction to improve transcription qualitytranscription_hint
(optional, string): Provide a custom vocabulary to the model to improve accuracy of transcribing context specific words, technical terms, names, etc. If empty, this argument is skipped.toggle_diarization
(optional, boolean): “true”/”false” - Activate the diarization of the audiotoggle_direct_translate
[Beta] (optional, boolean): “true”/”false” - Activate the direct translation of the audio transcription. Translation is currently in beta.target_translation_language
[Beta] (optional, string): e.g. “english” - If toogle_direct_translate is set to true, define the language to use for the translation of the transcriptionwebhook_url
(optional, url): e.g. “https://webhook.site/df092e71-24c0-4fa7-a672-c6913fa164ec” - Webhook URL to send the result to. Make sure it's network is open.diarization_num_speakers
(optional, integer): Guiding number of speakers - forces the model to detect an exact number of speakers in the audiodiarization_min_speakers
(optional, integer): Guiding minimum number of speakers - forces the model to detect no less than number of speakers in the audio.diarization_max_speakers
(optional, integer): Guiding maximum number of speakers - forces the model to detect no more than number of speakers in the audio.output_format
(optional, enum): Define the output format, allowing to have json, srt, vtt, txt (subtitle file format for video content)
Code samples
curl -X POST \
-H 'x-gladia-key: YOUR_GLADIA_TOKEN' \
-H 'accept: application/json' \
-F '[email protected];type=audio/mp3' \
-F 'toggle_diarization=True' \
https://api.gladia.io/audio/text/audio-transcription/
import requests
import os
headers = {
'x-gladia-key': '', # Replace with your Gladia Token
'accept': 'application/json', # Accept json as a response, but we are sending a Multipart FormData
}
print(os.getcwd())
file_path = 'audio.mp3' # Change with your file path
if os.path.exists(file_path): # This is here to check if the file exists
print("- File exists")
else:
print("- File does not exist")
file_name, file_extension = os.path.splitext(file_path) # Get your audio file name + extension
with open(file_path, 'rb') as f: # Open the file
files = {
# Sending a local audio file
'audio': (file_name, f, 'audio/'+file_extension[1:]), # Send it. Here it represents: (filename: string, file: BufferReader, fileMimeType: string)
# You can also send an URL for your audio file. Make sure it's the direct link and publicly accessible.
# 'audio_url': (None, 'http://files.gladia.io/example/audio-transcription/split_infinity.wav'),
# Then you can pass any parameters you wants. Please see: https://docs.gladia.io/reference/pre-recorded
'toggle_diarization': (None, True),
}
print('- Sending request to Gladia API...');
response = requests.post('https://api.gladia.io/audio/text/audio-transcription/', headers=headers, files=files)
if response.status_code == 200:
print('- Request successful');
result = response.json()
print(result)
else:
print('- Request failed');
print(response.json())
print('- End of work');
const axios = require("axios");
const FormData = require("form-data");
const gladiaKey = process.argv[2]
if (!gladiaKey) {
console.error('You must provide a gladia key. Go to app.gladia.io')
process.exit(1)
} else {
console.log('using the gladia key : ' + gladiaKey)
}
async function url() {
const form = new FormData();
form.append(
"audio_url",
"http://files.gladia.io/example/audio-transcription/split_infinity.wav"
);
form.append("output_format", "json");
const response = await axios.post(
"https://api.gladia.io/audio/text/audio-transcription/",
form,
{
headers: {
...form.getHeaders(),
accept: "application/json",
"x-gladia-key": gladiaKey,
"Content-Type": "multipart/form-data",
},
}
);
const stringResponse = JSON.stringify(response.data, null, 2)
console.log(stringResponse);
}
url()
import axios from "axios";
import fs from "fs";
import FormData from "form-data";
// retrieve gladia key
const gladiaKey = process.argv[2];
if (!gladiaKey) {
console.error("You must provide a gladia key. Go to app.gladia.io");
process.exit(1);
} else {
console.log("using the gladia key : " + gladiaKey);
}
const headers = {
"x-gladia-key": gladiaKey, // Replace with your Gladia Token
accept: "application/json", // Accept json as a response, but we are sending a Multipart FormData
};
const file_path = "audio.mp3"; // Change with your file path
fs.access(file_path, fs.constants.F_OK, (err) => {
if (err) {
console.log("- File does not exist");
} else {
console.log("- File exists");
const form = new FormData();
const stream = fs.createReadStream(file_path);
// Explicitly set filename, and mimeType
form.append("audio", stream, {
filename: "anna-and-sasha-16000.wav",
contentType: "audio/wav",
});
// form.append(
// "audio_url",
// "http://files.gladia.io/example/audio-transcription/split_infinity.wav"
// );
form.append("toggle_diarization", "true"); // form-data library requires fields to be string, Buffer or Stream
console.log("- Sending request to Gladia API...");
axios
.post("https://api.gladia.io/audio/text/audio-transcription/", form, {
// form.getHeaders to get correctly formatted form-data boundaries
// https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type
headers: { ...form.getHeaders(), ...headers },
})
.then((response) => {
console.log(response.data); // Get the results
console.log(response.status); // Status code
console.log("- End of work");
})
.catch((error) => {
console.error(error);
});
}
});
<?php
$file_path = 'audio.mp3'; // Change with your file path
if (file_exists($file_path)) { // This is here to check if the file exists
echo "- File exists\n";
} else {
echo "- File does not exist\n";
}
$file_name = pathinfo($file_path, PATHINFO_FILENAME); // Get your audio file name
$file_extension = pathinfo($file_path, PATHINFO_EXTENSION); // Get your audio file extension
$audio_file = new CurlFile($file_path, 'audio/'.$file_extension); // Create a CurlFile object
$data = [
// Sending a local audio file
'audio' => $audio_file,
// You can also send an URL for your audio file. Make sure it's the direct link and publicly accessible.
// 'audio_url' => 'http://files.gladia.io/example/audio-transcription/split_infinity.wav',
// Then you can pass any parameters you want. Please see: https://docs.gladia.io/reference/pre-recorded
'toggle_diarization' => true,
];
$headers = [
'x-gladia-key: {YOUR_GLADIA_TOKEN}', // Replace with your Gladia Token
'accept: application/json', // Accept json as a response, but we are sending a Multipart FormData
];
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'https://api.gladia.io/audio/text/audio-transcription/');
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
echo "- Sending request to Gladia API...\n";
$response = curl_exec($curl);
$status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if ($status_code == 200) {
echo "- Request successful\n";
$result = json_decode($response, true);
print_r($result);
} else {
echo "- Request failed\n";
$error = json_decode($response, true);
print_r($error);
}
curl_close($curl);
echo "- End of work\n";
?>