This API provides an interface for converting spoken language into written text, which is an essential capability for applications such as voice assistants, subtitling services, note-taking apps, and many more. Here, we will provide a comprehensive guide on the parameters that can be used with this API endpoint to customize and optimize the transcription process according to your needs.

The Audio Transcription API Endpoint accepts audio files and transcribes them into text by leveraging sophisticated speech recognition algorithms. It is crucial to understand the parameters that can be used when making a request to this endpoint, as they dictate how the API processes the audio input.

Sending Audio for Transcription

Audio transcription endpoint

The post endpoint for Gladia Audio Transcription API is:

<https://api.gladia.io/audio/text/audio-transcription/>

Body Params

  • audio (optional if audio_url filled, file): The file to transcribe
  • audio_url (optional if audio filled, string): - - The file url to transcript, ignored if an audio file is provided. This can be a public file url or any of the supported social platform listed in the documentation.`
  • language_behaviour (optional, enum): one the the following [manual, automatic single language, automatic multiple languages] - Define how the speaker's language will be detected. Default value is 'automatic single language'.
  • language (optional, string): e.g. “english” - If language_behaviour is set to manual, define the language to use for the transcription
  • toggle_noise_reduction (optional, boolean): “true”/”false” - Activate the noise reduction to improve transcription quality
  • transcription_hint (optional, string): Provide a custom vocabulary to the model to improve accuracy of transcribing context specific words, technical terms, names, etc. If empty, this argument is skipped.
  • toggle_diarization (optional, boolean): “true”/”false” - Activate the diarization of the audio
  • toggle_direct_translate [Beta] (optional, boolean): “true”/”false” - Activate the direct translation of the audio transcription. Translation is currently in beta.
  • target_translation_language [Beta] (optional, string): e.g. “english” - If toogle_direct_translate is set to true, define the language to use for the translation of the transcription
  • webhook_url (optional, url): e.g. “https://webhook.site/df092e71-24c0-4fa7-a672-c6913fa164ec” - Webhook URL to send the result to. Make sure it's network is open.
  • diarization_num_speakers (optional, integer): Guiding number of speakers - forces the model to detect an exact number of speakers in the audio
  • diarization_min_speakers (optional, integer): Guiding minimum number of speakers - forces the model to detect no less than number of speakers in the audio.
  • diarization_max_speakers (optional, integer): Guiding maximum number of speakers - forces the model to detect no more than number of speakers in the audio.
  • output_format (optional, enum): Define the output format, allowing to have json, srt, vtt, txt (subtitle file format for video content)

Sending Video for Transcription

Video transcription endpoint

The post endpoint for Gladia Audio Transcription API is:

<https://api.gladia.io/audio/text/video-transcription/>

Body Params

  • video (optional if video_url filled, file): The file to transcribe
  • video_url (optional if video filled, string): - - The file url to transcript, ignored if a video file is provided. This can be a public file url or any of the supported social platform listed in the documentation.`
  • language_behaviour (optional, enum): one the the following [manual, automatic single language, automatic multiple languages] - Define how the speaker's language will be detected. Default value is 'automatic single language'.
  • language (optional, string): e.g. “english” - If language_behaviour is set to manual, define the language to use for the transcription
  • toggle_noise_reduction (optional, boolean): “true”/”false” - Activate the noise reduction to improve transcription quality
  • transcription_hint (optional, string): Provide a custom vocabulary to the model to improve accuracy of transcribing context specific words, technical terms, names, etc. If empty, this argument is skipped.
  • toggle_diarization (optional, boolean): “true”/”false” - Activate the diarization of the audio
  • toggle_direct_translate [Beta] (optional, boolean): “true”/”false” - Activate the direct translation of the audio transcription. Translation is currently in beta.
  • target_translation_language [Beta] (optional, string): e.g. “english” - If toogle_direct_translate is set to true, define the language to use for the translation of the transcription
  • webhook_url (optional, url): e.g. “https://webhook.site/df092e71-24c0-4fa7-a672-c6913fa164ec” - Webhook URL to send the result to. Make sure it's network is open.
  • diarization_num_speakers (optional, integer): Guiding number of speakers - forces the model to detect an exact number of speakers in the audio
  • diarization_min_speakers (optional, integer): Guiding minimum number of speakers - forces the model to detect no less than number of speakers in the audio.
  • diarization_max_speakers (optional, integer): Guiding maximum number of speakers - forces the model to detect no more than number of speakers in the audio.
  • output_format (optional, enum): Define the output format, allowing to have json, srt, vtt, txt (subtitle file format for video content)

Code samples

curl -X POST \
  -H 'x-gladia-key: YOUR_GLADIA_TOKEN' \
  -H 'accept: application/json' \
  -F '[email protected];type=audio/mp3' \
  -F 'toggle_diarization=True' \
  https://api.gladia.io/audio/text/audio-transcription/
import requests
import os

headers = {
   'x-gladia-key': '', # Replace with your Gladia Token
   'accept': 'application/json', # Accept json as a response, but we are sending a Multipart FormData
}

print(os.getcwd())
file_path = 'audio.mp3' # Change with your file path

if os.path.exists(file_path): # This is here to check if the file exists
  print("- File exists")
else:
  print("- File does not exist")

file_name, file_extension = os.path.splitext(file_path) # Get your audio file name + extension

with open(file_path, 'rb') as f:  # Open the file
  files = {
    # Sending a local audio file
    'audio': (file_name, f, 'audio/'+file_extension[1:]), # Send it. Here it represents: (filename: string, file: BufferReader, fileMimeType: string)
    # You can also send an URL for your audio file. Make sure it's the direct link and publicly accessible.
    # 'audio_url': (None, 'http://files.gladia.io/example/audio-transcription/split_infinity.wav'),
    # Then you can pass any parameters you wants. Please see: https://docs.gladia.io/reference/pre-recorded
    'toggle_diarization': (None, True),
  }
  print('- Sending request to Gladia API...');
  response = requests.post('https://api.gladia.io/audio/text/audio-transcription/', headers=headers, files=files)
  if response.status_code == 200:
    print('- Request successful');
    result = response.json()
    print(result)
  else:
    print('- Request failed');
    print(response.json())
  print('- End of work');
const axios = require("axios");
const FormData = require("form-data");

const gladiaKey = process.argv[2]
if (!gladiaKey) {
  console.error('You must provide a gladia key. Go to app.gladia.io')
  process.exit(1)
} else {
  console.log('using the gladia key : ' + gladiaKey)
}

async function url() {
  const form = new FormData();
  form.append(
    "audio_url",
    "http://files.gladia.io/example/audio-transcription/split_infinity.wav"
  );
  form.append("output_format", "json");

  const response = await axios.post(
    "https://api.gladia.io/audio/text/audio-transcription/",
    form,
    {
      headers: {
        ...form.getHeaders(),
        accept: "application/json",
        "x-gladia-key": gladiaKey,
        "Content-Type": "multipart/form-data",
      },
    }
  );
  const stringResponse = JSON.stringify(response.data, null, 2)
  console.log(stringResponse);
}

url()
import axios from "axios";
import fs from "fs";
import FormData from "form-data";

// retrieve gladia key
const gladiaKey = process.argv[2];
if (!gladiaKey) {
  console.error("You must provide a gladia key. Go to app.gladia.io");
  process.exit(1);
} else {
  console.log("using the gladia key : " + gladiaKey);
}

const headers = {
  "x-gladia-key": gladiaKey, // Replace with your Gladia Token
  accept: "application/json", // Accept json as a response, but we are sending a Multipart FormData
};

const file_path = "audio.mp3"; // Change with your file path

fs.access(file_path, fs.constants.F_OK, (err) => {
  if (err) {
    console.log("- File does not exist");
  } else {
    console.log("- File exists");

    const form = new FormData();
    const stream = fs.createReadStream(file_path);

    // Explicitly set filename, and mimeType
    form.append("audio", stream, {
      filename: "anna-and-sasha-16000.wav",
      contentType: "audio/wav",
    });
    // form.append(
    //   "audio_url",
    //   "http://files.gladia.io/example/audio-transcription/split_infinity.wav"
    // );
    form.append("toggle_diarization", "true"); // form-data library requires fields to be string, Buffer or Stream

    console.log("- Sending request to Gladia API...");

    axios
      .post("https://api.gladia.io/audio/text/audio-transcription/", form, {
        // form.getHeaders to get correctly formatted form-data boundaries
        // https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type
        headers: { ...form.getHeaders(), ...headers },
      })
      .then((response) => {
        console.log(response.data); // Get the results
        console.log(response.status); // Status code
        console.log("- End of work");
      })
      .catch((error) => {
        console.error(error);
      });
  }
});
<?php

$file_path = 'audio.mp3'; // Change with your file path

if (file_exists($file_path)) { // This is here to check if the file exists
    echo "- File exists\n";
} else {
    echo "- File does not exist\n";
}

$file_name = pathinfo($file_path, PATHINFO_FILENAME); // Get your audio file name
$file_extension = pathinfo($file_path, PATHINFO_EXTENSION); // Get your audio file extension

$audio_file = new CurlFile($file_path, 'audio/'.$file_extension); // Create a CurlFile object

$data = [
    // Sending a local audio file
    'audio' => $audio_file,
    // You can also send an URL for your audio file. Make sure it's the direct link and publicly accessible.
    // 'audio_url' => 'http://files.gladia.io/example/audio-transcription/split_infinity.wav',
    // Then you can pass any parameters you want. Please see: https://docs.gladia.io/reference/pre-recorded
    'toggle_diarization' => true,
];

$headers = [
    'x-gladia-key: {YOUR_GLADIA_TOKEN}', // Replace with your Gladia Token
    'accept: application/json', // Accept json as a response, but we are sending a Multipart FormData
];

$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'https://api.gladia.io/audio/text/audio-transcription/');
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

echo "- Sending request to Gladia API...\n";
$response = curl_exec($curl);
$status_code = curl_getinfo($curl, CURLINFO_HTTP_CODE);

if ($status_code == 200) {
    echo "- Request successful\n";
    $result = json_decode($response, true);
    print_r($result);
} else {
    echo "- Request failed\n";
    $error = json_decode($response, true);
    print_r($error);
}

curl_close($curl);
echo "- End of work\n";

?>