Getting started
Get started with Gladia Real-time Speech to Text (STT) API
Initiate your Real-time session
First, you need to call the POST /v2/live
endpoint and pass your configuration.
It’s important to correctly define the properties encoding
, sample_rate
, bit_depth
and channels
as we
cannot guess them and we use those to parse your audio chunks.
You’ll receive a WebSocket url to connect to. If you loose connection, you can reconnect to that same url to resume where you left. Example response:
Connect to the WebSocket
Now that you initiated the session and have the url, you can connect to the WebSocket using your preferred language/framework. Here is an example in JavaScript.
Send audio chunks
You can now start sending us your audio chunks through the WebSocket. You can send them directly as binary or in JSON by encoding your chunk in base64.
Read messages
During the whole session, we will send various messages through the WebSocket, the callback url or webhooks.
You can specify which kind of messages you want to receive in the initial configuration. See messages_config
for WebSocket messages and callback_config
for callback messages.
Here is how you would read the transcript
message you would receive through the WebSocket.
Stop the recording
Once you are done, send us the stop_recording
message.
We will process remaining audio chunks and start the post-processing phase where we build the final audio file and results with the additional addons you specified.
You’ll receive a message at every step of the process in the WebSocket or the callback if you configured any.
Once the post-processing is done, the WebSocket is closed with a code 1000.
Instead of sending the stop_recording
message, you can also close the WebSocket with the code 1000.
We will still do the post-processing in background and send you the messages through the callback you defined.
Get the final results
If you want to get the complete result, you can call the GET /v2/live/:id
endpoint with the id
you received from the initial request.
Full sample
You can find a complete sample in our Github repository:
Was this page helpful?