Supported files & duration

We support almost all types of audio or video files with a tradeoff to be taken into account between the transfer time of specific formats that can generate big files and the time to convert the original format to the target one (WAV pcm 16KHz little-endian).

You can find an estimate of the conversion times in the table below.

Gladia API current limitations

Those limits will be gradually lifted to ensure the full stability and performance of the service for everyone.

Audio length: The maximum length of audio that can be transcribed in a single request is currently 135 minutes. Attempts to transcribe longer audio files will result in an error. Direct YouTube links are limited to 120 minutes instead of 135 minutes.

We support up to 4h15 audio length for enterprise plans.

File size: Audio files must not exceed 1000 MB in size. Larger files will not be accepted by the API.

Splitting oversize audio files

For audio files that are near or exceed the limitations on length and size, it is recommended to split them into smaller chunks of ~60 minutes each. This approach not only adheres to the API constraints but also generally yields better transcription results.

Tools for Splitting Audio Files:

FFMPEG : FFMPEG is a versatile command-line tool that can be used to manipulate audio and video files. It is a popular choice for splitting long audio files.
ffmpeg-python : For Python users, ffmpeg-python is a wrapper around FFMPEG that provides a more Pythonic interface for interacting with FFMPEG.
prism-media for Node.js : Node.js users can use prism-media for manipulating media files, including splitting audio files.
fluent-ffmpeg for Node.js : Another option for Node.js users is fluent-ffmpeg, which offers a simpler and more fluent API for handling media files.

Following these best practices will help you avoid issues due to limitations and maximize the quality of the transcriptions you obtain from the Audio Transcription API.

Supported audio formats

Source Format	Mime Type	Audio/Video
aac	audio/aac	Audio
ac3	audio/ac3	Audio
eac3	audio/eac3	Audio
flac	audio/flac	Audio
m4a	audio/mp4	Audio
mp2	audio/mpeg	Audio
mp3	audio/mpeg	Audio
ogg	application/ogg	Audio
opus	audio/opus	Audio
wav	audio/wav	Audio

Supported video formats

Source Format	Mime Type	Audio/Video
3g2	video/3gpp2	Video
3gp	video/3gpp	Video
avi	video/x-msvideo	Video
flv	video/x-flv	Video
m4v	video/x-m4v	Video
matroska	video/x-matroska	Audio/Video
mov	video/quicktime	Video
mp4	video/mp4	Audio/Video
wmv	video/x-ms-wmv	Video

Supported online video services

Platform	Audio/Video Support	Stage
YouTube	Video	Released
TikTok	Video	Released
Instagram	Video	Released
Facebook	Video	Released
Vimeo	Video	Released
Dailymotion	Video	Released
LinkedIn	Video	Released
Sharechat	Video	Released
Likee	Video	Released
TikTok (Beta)	Video	Beta
Twitter (Beta)	Video	Beta

Conversion time

Source Format	Mime Type	Audio/Video	Estimated File Size (1 Hour)	Estimated Conversion Time (1 Hour)
3g2	video/3gpp2	Video	~300 MB	~30 seconds
3gp	video/3gpp	Video	~300 MB	~40 seconds
aac	audio/aac	Audio	~60 MB	~36 seconds
ac3	audio/ac3	Audio	~215 MB	~42 seconds
avi	video/x-msvideo	Video	~800 MB	~1 minute
eac3	audio/eac3	Audio	~215 MB	~32 seconds
flac	audio/flac	Audio	~260 MB	~46 seconds
flv	video/x-flv	Video	~400 MB	~40 seconds
m4a	audio/m4a	Audio	~60 MB	~26 seconds
x-m4a	audio/x-m4a	Audio	~60 MB	~26 seconds
m4v	video/x-m4v	Video	~800 MB	~1 minute
matroska	video/x-matroska	Audio/Video	~800 MB	~1 minute
mov	video/quicktime	Video	~800 MB	~1 minute
mp2	audio/mpeg	Audio	~120 MB	~42 seconds
mp3	audio/mpeg	Audio	~120 MB	~37 seconds
mp4	video/mp4	Audio/Video	~800 MB	~1 minute
ogg	application/ogg	Audio	~60 MB	~1 minute
opus	audio/opus	Audio	~30 MB	~1 minute
wav	audio/wav	Audio	~510 MB	N/A
wmv	video/x-ms-wmv	Video	~800 MB	~1 minute

Introduction

Asynchronous Speech-to-Text

Real-time Speech-to-Text

Audio Intelligence

Limits & Specifications

Guides

Integration

Supported files & duration

Gladia API current limitations

Splitting oversize audio files

Supported audio formats

Supported video formats

Supported online video services

Conversion time

Introduction

Asynchronous Speech-to-Text

Real-time Speech-to-Text

Audio Intelligence

Limits & Specifications

Guides

Integration

​Gladia API current limitations

​Splitting oversize audio files

​Supported audio formats

​Supported video formats

​Supported online video services

​Conversion time

Gladia API current limitations

Splitting oversize audio files

Supported audio formats

Supported video formats

Supported online video services

Conversion time