We support almost all types of audio or video files with a tradeoff to be taken into account between the transfer time of specific formats that can generate big files and the time to convert the original format to the target one (WAV pcm 16KHz little-endian).

You can find an estimate of the conversion times in the table below.

Gladia API current limitations

Those limits will be gradually lifted to ensure the full stability and performance of the service for everyone.

  • Audio length: The maximum length of audio that can be transcribed in a single request is currently 135 minutes. Attempts to transcribe longer audio files will result in an error. Direct YouTube links are limited to 120 minutes instead of 135 minutes.

We support up to 4h15 audio length for enterprise plans.

  • File size: Audio files must not exceed 1000 MB in size. Larger files will not be accepted by the API.

Splitting oversize audio files

For audio files that are near or exceed the limitations on length and size, it is recommended to split them into smaller chunks of ~60 minutes each. This approach not only adheres to the API constraints but also generally yields better transcription results.

Tools for Splitting Audio Files:

  • FFMPEG : FFMPEG is a versatile command-line tool that can be used to manipulate audio and video files. It is a popular choice for splitting long audio files.
  • ffmpeg-python : For Python users, ffmpeg-python is a wrapper around FFMPEG that provides a more Pythonic interface for interacting with FFMPEG.
  • prism-media for Node.js : Node.js users can use prism-media for manipulating media files, including splitting audio files.
  • fluent-ffmpeg for Node.js : Another option for Node.js users is fluent-ffmpeg, which offers a simpler and more fluent API for handling media files.

Following these best practices will help you avoid issues due to limitations and maximize the quality of the transcriptions you obtain from the Audio Transcription API.

Supported audio formats

Source FormatMime TypeAudio/Video
aacaudio/aacAudio
ac3audio/ac3Audio
eac3audio/eac3Audio
flacaudio/flacAudio
m4aaudio/mp4Audio
mp2audio/mpegAudio
mp3audio/mpegAudio
oggapplication/oggAudio
opusaudio/opusAudio
wavaudio/wavAudio

Supported video formats

Source FormatMime TypeAudio/Video
3g2video/3gpp2Video
3gpvideo/3gppVideo
avivideo/x-msvideoVideo
flvvideo/x-flvVideo
m4vvideo/x-m4vVideo
matroskavideo/x-matroskaAudio/Video
movvideo/quicktimeVideo
mp4video/mp4Audio/Video
wmvvideo/x-ms-wmvVideo

Supported online video services

PlatformAudio/Video SupportStage
YouTubeVideoReleased
TikTokVideoReleased
InstagramVideoReleased
FacebookVideoReleased
VimeoVideoReleased
DailymotionVideoReleased
LinkedInVideoReleased
SharechatVideoReleased
LikeeVideoReleased
TikTok (Beta)VideoBeta
Twitter (Beta)VideoBeta

Conversion time

Source FormatMime TypeAudio/VideoEstimated File Size (1 Hour)Estimated Conversion Time (1 Hour)
3g2video/3gpp2Video~300 MB~30 seconds
3gpvideo/3gppVideo~300 MB~40 seconds
aacaudio/aacAudio~60 MB~36 seconds
ac3audio/ac3Audio~215 MB~42 seconds
avivideo/x-msvideoVideo~800 MB~1 minute
eac3audio/eac3Audio~215 MB~32 seconds
flacaudio/flacAudio~260 MB~46 seconds
flvvideo/x-flvVideo~400 MB~40 seconds
m4aaudio/m4aAudio~60 MB~26 seconds
x-m4aaudio/x-m4aAudio~60 MB~26 seconds
m4vvideo/x-m4vVideo~800 MB~1 minute
matroskavideo/x-matroskaAudio/Video~800 MB~1 minute
movvideo/quicktimeVideo~800 MB~1 minute
mp2audio/mpegAudio~120 MB~42 seconds
mp3audio/mpegAudio~120 MB~37 seconds
mp4video/mp4Audio/Video~800 MB~1 minute
oggapplication/oggAudio~60 MB~1 minute
opusaudio/opusAudio~30 MB~1 minute
wavaudio/wavAudio~510 MBN/A
wmvvideo/x-ms-wmvVideo~800 MB~1 minute