- Using our SDKs
- Using the API
The SDK simplifies real-time speech-to-text integration by abstracting the underlying API. Designed for developers, it offers:
- Effortless implementation with minimal code to write.
- Built-in resilience with automatic error handling (e.g., reconnection on network drops) ensures uninterrupted transcription. No need to manually manage retries or state recovery.
Install the SDK
Initiate your real-time session
First, call the endpoint and pass your configuration. It’s important to correctly define the propertiesencoding
, sample_rate
, bit_depth
and channels
as we need them to parse your audio chunks.Why initiate with POST instead of connecting directly to the WebSocket?
Why initiate with POST instead of connecting directly to the WebSocket?
- Security: Generate the WebSocket URL on your backend and keep your API key private. The init call returns a connectable URL and a session
id
that you can safely pass to web, iOS, or Android clients without exposing credentials in the app. - Lower infrastructure load: The secure URL is generated on your backend, the client can connect directly to Gladia’s WebSocket server without a pass-through on your side, saving your own resources.
- Resilient reconnection and session continuity: If the WebSocket disconnects (which can happen on unreliable networks), the session created by the init call lets the client reconnect without losing context. Traditional flows that open a socket first typically force a brand‑new session on disconnect, dropping in‑progress state.
Connect to the WebSocket
Now that you’ve initiated the session and have the URL, you can connect to the WebSocket using your preferred language/framework. Here’s an example in JavaScript:Send audio chunks
You can now start sending us your audio chunks through the WebSocket:Read messages
During the whole session, we will send various messages through the WebSocket, the callback URL or webhooks. You can specify which kind of messages you want to receive in the initial configuration. Seemessages_config
for WebSocket messages and callback_config
for callback messages.Here’s an example of how to read a transcript
message received through a WebSocket:Need low-latency partial results?Enable partial transcripts by setting
messages_config.receive_partial_transcripts: true
.Use the is_final
property to distinguish between partial and final transcript messages.Stop the recording
Once you’re done, send us thestop_recording
message. We will process remaining audio chunks and start the post-processing phase, in which we put together the final audio file and results with the add-ons you requested.You’ll receive a message at every step of the process in the WebSocket, or in the callback if configured. Once the post-processing is done, the WebSocket is closed with a code 1000.Get the final results
If you want to get the complete result, you can call theGET /v2/live/:id
endpoint with the id
you received from the initial request.Want to know more about a specific feature? Check out our Features chapter for more details.