> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gladia.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Quickstart

> How to transcribe live audio with Gladia's Real-time speech-to-text (STT) API

<Tabs>
  <Tab title="Using our SDKs" icon="code">
    The SDK simplifies real-time speech-to-text integration by abstracting the underlying API. Designed for developers, it offers:

    * Effortless implementation with minimal code to write.
    * Built-in resilience with automatic error handling (e.g., reconnection on network drops) ensures uninterrupted transcription. No need to manually manage retries or state recovery.

    ## Install the SDK

    <CodeGroup>
      ```sh JavaScript theme={"system"}
      npm install @gladiaio/sdk
      ```

      ```sh Python theme={"system"}
      # Using pip
      pip install gladiaio-sdk

      # Using uv
      uv add gladiaio-sdk
      ```
    </CodeGroup>

    <CodeGroup>
      ```javascript JavaScript theme={"system"}
      import { GladiaClient } from "@gladiaio/sdk";
      ```

      ```python Python theme={"system"}
      from gladiaio_sdk import (
          GladiaClient,
          LiveV2InitRequest,
          LiveV2LanguageConfig,
          LiveV2MessagesConfig,
          LiveV2WebSocketMessage,
          LiveV2InitResponse,
          LiveV2EndedMessage,
      )
      ```
    </CodeGroup>

    ## Initiate your real-time session

    First, call the [ endpoint](/api-reference/v2/live/init) and pass your configuration.
    It's important to correctly define the properties `encoding`, `sample_rate`, `bit_depth` and `channels` as we need them to parse your audio chunks.

    <CodeGroup>
      ```typescript JavaScript theme={"system"}
      const gladiaClient = new GladiaClient({
        apiKey: <YOUR_GLADIA_API_KEY>,
      });

      const gladiaConfig = {
        model: "solaria-1",
        encoding: 'wav/pcm',
        sample_rate: 16000,
        bit_depth: 16,
        channels: 1,
        language_config: {
          languages: ["fr"],
          code_switching: false,
        },
      };

      const liveSession = gladiaClient.liveV2().startSession(gladiaConfig);
      ```

      ```python Python theme={"system"}
      # Our Python SDK supports sync/threaded and asyncio versions.
      gladia_client = GladiaClient(api_key="<YOUR_GLADIA_API_KEY>")

      # sync/threaded version
      live_client = gladia_client.live_v2()
      # asyncio version
      live_client = gladia_client.live_v2_async()

      init_request = LiveV2InitRequest(
          model="solaria-1",
          encoding="wav/pcm",
          sample_rate=16000,
          bit_depth=16,
          channels=1,
          language_config=LiveV2LanguageConfig(languages=["fr"], code_switching=False),
          messages_config=LiveV2MessagesConfig(receive_partial_transcripts=True),
      )

      live_session = live_client.start_session(init_request)
      ```
    </CodeGroup>

    <Accordion title="Why initiate with POST instead of connecting directly to the WebSocket?">
      * **Security**: Generate the WebSocket URL on your backend and keep your API key private. The init call returns a connectable URL and a session `id` that you can safely pass to web, iOS, or Android clients without exposing credentials in the app.
      * **Lower infrastructure load**: The secure URL is generated on your backend, the client can connect directly to Gladia's WebSocket server without a pass-through on your side, saving your own resources.
      * **Resilient reconnection and session continuity**: If the WebSocket disconnects (which can happen on unreliable networks), the session created by the init call lets the client reconnect without losing context. Traditional flows that open a socket first typically force a brand‑new session on disconnect, dropping in‑progress state.
    </Accordion>

    ## Connect to the WebSocket

    Now that you've initiated the session and have the URL, you can connect to the WebSocket using your preferred language/framework. Here's an example in JavaScript:

    <CodeGroup>
      ```typescript JavaScript theme={"system"}
      liveSession.on("message", (message) => {
        // Handle messages from the API
      });
      liveSession.on("started", (message) => {
        // Handle start session message
      });
      liveSession.on("ended", (message) => {
        // Handle end session message
      });
      liveSession.on("error", (message) => {
        // Handle error message
      });
      ```

      ```python Python theme={"system"}
      from gladiaio_sdk import (
          LiveV2WebSocketMessage,
          LiveV2InitResponse,
          LiveV2EndedMessage,
      )

      @live_session.on("message")
      def on_message(message: LiveV2WebSocketMessage) -> None:
          # Handle messages from the API
          pass

      @live_session.on("error")
      def on_error(error: Exception) -> None:
          # Handle error message
          print(f"Live session error: {error}")

      @live_session.once("started")
      def on_started(_response: LiveV2InitResponse):
          # Handle start session
          print("Session started. Listening…")

      @live_session.once("ended")
      def on_ended(_ended: LiveV2EndedMessage):
          # Handle end session
          print("Session ended.")
      ```
    </CodeGroup>

    ## Send audio chunks

    You can now start sending us your audio chunks through the WebSocket:

    <CodeGroup>
      ```typescript JavaScript theme={"system"}
      liveSession.sendAudio(audioChunk)
      ```

      ```python Python theme={"system"}
      live_session.send_audio(audio_chunk)
      ```
    </CodeGroup>

    <Note>
      A single realtime transcription session cannot exceed **3 hours**. For longer events, start a new session before reaching the limit. See [Concurrency and rate limits](/chapters/limits-and-specifications/concurrency) and [Supported files & duration](/chapters/limits-and-specifications/supported-formats) for details.
    </Note>

    ## Read messages

    During the whole session, we will send various messages through the WebSocket, the callback URL or webhooks. You can specify which kind of messages you want to receive in the initial configuration. See [`messages_config`](/api-reference/v2/live/init) for WebSocket messages and [`callback_config`](/api-reference/v2/live/init) for callback messages.

    Here's an example of how to read a [`transcript`](/api-reference/v2/live/message/transcript) message received through a WebSocket:

    <CodeGroup>
      ```javascript JavaScript theme={"system"}
      liveSession.on("message", (message) => {
        if (message.type === 'transcript' && message.data.is_final) {
          console.log(`${message.data.id}: ${message.data.utterance.text}`)
      });
      ```

      ```python Python theme={"system"}
      @live_session.on("message")
      def on_message(message: LiveV2WebSocketMessage) -> None:
          if getattr(message, "type", None) == "transcript":
              data = getattr(message, "data", None)
              if not data:
                  return
              is_final = bool(getattr(data, "is_final", False))
              utterance = getattr(data, "utterance", None)
              text = getattr(utterance, "text", "") if utterance else ""
              if is_final and text:
                  print(text.strip())
      ```
    </CodeGroup>

    <Info>
      **Need low-latency partial results?**

      Enable [partial transcripts](/chapters/live-stt/features/partial-transcripts) by setting `messages_config.receive_partial_transcripts: true`.

      Use the `is_final` property to distinguish between partial and final transcript messages.
    </Info>

    ## Stop the recording

    Once you're done, send us the `stop_recording` message. We will process remaining audio chunks and start the post-processing phase, in which we put together the final audio file and results with the add-ons you requested.

    You'll receive a message at every step of the process in the WebSocket, or in the callback if configured. Once the post-processing is done, the WebSocket is closed with a code 1000.

    <CodeGroup>
      ```javascript JavaScript theme={"system"}
      liveSession.stopRecording()
      ```

      ```python Python theme={"system"}
      live_session.stop_recording()
      ```
    </CodeGroup>

    ## Get the final results

    If you want to get the complete result, you can call the [`GET /v2/live/:id` endpoint](/api-reference/v2/live/get) with the `id` you received from the initial request.

    <CodeGroup>
      ```javascript JavaScript theme={"system"}
        const response = await fetch(`https://api.gladia.io/v2/live/${sessionId}`, {
          method: 'GET',
          headers: {
            'x-gladia-key': '<YOUR_GLADIA_API_KEY>',
          },
        });
        if (!response.ok) {
          // Look at the error message
          // It might be a configuration issue
          console.error(`${response.status}: ${(await response.text()) || response.statusText}`)
          return;
        }

        const result = await response.json();
        console.log(result)
      ```

      ```python Python theme={"system"}
      import os
      import requests

      session_id = "<SESSION_ID>"
      api_key = os.environ.get("GLADIA_API_KEY") or "<YOUR_GLADIA_API_KEY>"

      response = requests.get(
          f"https://api.gladia.io/v2/live/{session_id}",
          headers={"x-gladia-key": api_key},
      )

      if not response.ok:
          print(f"{response.status_code}: {response.text or response.reason}")
      else:
          print(response.json())
      ```

      ```bash cURL theme={"system"}
      curl --request GET \
        --url https://api.gladia.io/v2/live/ID_OF_THE_SESSION \
        --header 'x-gladia-key: <YOUR_GLADIA_API_KEY>'
      ```
    </CodeGroup>
  </Tab>

  <Tab title="Using the API" icon="brackets-curly">
    ## Initiate your real-time session

    First, call the [ endpoint](/api-reference/v2/live/init) and pass your configuration.
    It's important to correctly define the properties `encoding`, `sample_rate`, `bit_depth` and `channels` as we need them to parse your audio chunks.

    <CodeGroup>
      ```javascript JavaScript theme={"system"}
      const response = await fetch("https://api.gladia.io/v2/live", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          "x-gladia-key": "<YOUR_GLADIA_API_KEY>",
        },
        body: JSON.stringify({
          encoding: "wav/pcm",
          sample_rate: 16000,
          bit_depth: 16,
          channels: 1,
        }),
      });
      if (!response.ok) {
        // Look at the error message
        // It might be a configuration issue
        console.error(
          `${response.status}: ${(await response.text()) || response.statusText}`
        );
        process.exit(response.status);
      }

      const { id, url } = await response.json();
      ```

      ```bash cURL theme={"system"}
      curl --request POST \
        --url https://api.gladia.io/v2/live \
        --header 'Content-Type: application/json' \
        --header 'x-gladia-key: YOUR_GLADIA_API_KEY' \
        --data '{
          "encoding": "wav/pcm",
          "sample_rate": 16000,
          "bit_depth": 16,
          "channels": 1
          }
        '
      ```
    </CodeGroup>

    You'll receive a response with a WebSocket URL to connect to. If you loose connection, you can reconnect to that same URL and resume where you left off. Here's an example of a response:

    <Accordion title="Why initiate with POST instead of connecting directly to the WebSocket?">
      * **Security**: Generate the WebSocket URL on your backend and keep your API key private. The init call returns a connectable URL and a session `id` that you can safely pass to web, iOS, or Android clients without exposing credentials in the app.
      * **Lower infrastructure load**: The secure URL is generated on your backend, the client can connect directly to Gladia's WebSocket server without a pass-through on your side, saving your own resources.
      * **Resilient reconnection and session continuity**: If the WebSocket disconnects (which can happen on unreliable networks), the session created by the init call lets the client reconnect without losing context. Traditional flows that open a socket first typically force a brand‑new session on disconnect, dropping in‑progress state.
    </Accordion>

    ```json theme={"system"}
    {
      "id": "636c70f6-92c1-4026-a8b6-0dfe3ecf826f",
      "url": "wss://api.gladia.io/v2/live?token=636c70f6-92c1-4026-a8b6-0dfe3ecf826f"
    }
    ```

    ## Connect to the WebSocket

    Now that you've initiated the session and have the URL, you can connect to the WebSocket using your preferred language/framework. Here's an example in JavaScript:

    <CodeGroup>
      ```javascript JavaScript theme={"system"}
      import WebSocket from "ws";

      const socket = new WebSocket(url);

      socket.addEventListener("open", function () {
        // Connection is opened. You can start sending audio chunks.
      });

      socket.addEventListener("error", function (error) {
        // An error occurred during the connection.
        // Check the error to understand why
      });

      socket.addEventListener("close", function ({ code, reason }) {
        // The connection has been closed
        // If the "code" is equal to 1000, it means we closed intentionally the connection (after the end of the session for example).
        // Otherwise, you can reconnect to the same url.
      });

      socket.addEventListener("message", function (event) {
        // All the messages we are sending are in JSON format
        const message = JSON.parse(event.data.toString());
        console.log(message);
      });
      ```
    </CodeGroup>

    ## Send audio chunks

    You can now start sending us your audio chunks through the WebSocket. You can send them directly as binary, or in JSON by encoding your chunk in base64, like this:

    <CodeGroup>
      ```javascript JavaScript theme={"system"}
      // as binary
      socket.send(buffer);

      // as json
      socket.send(
        JSON.stringify({
          type: "audio_chunk",
          data: {
            chunk: buffer.toString("base64"),
          },
        })
      );
      ```
    </CodeGroup>

    <Note>
      A single realtime transcription session cannot exceed **3 hours**. For longer events, start a new session before reaching the limit. See [Concurrency and rate limits](/chapters/limits-and-specifications/concurrency) and [Supported files & duration](/chapters/limits-and-specifications/supported-formats) for details.
    </Note>

    ## Read messages

    During the whole session, we will send various messages through the WebSocket, the callback URL or webhooks. You can specify which kind of messages you want to receive in the initial configuration. See [`messages_config`](/api-reference/v2/live/init) for WebSocket messages and [`callback_config`](/api-reference/v2/live/init) for callback messages.

    Here's an example of how to read a [`transcript`](/api-reference/v2/live/message/transcript) message received through a WebSocket:

    <CodeGroup>
      ```javascript JavaScript theme={"system"}
      socket.addEventListener("message", function(event) {
        // All the messages we are sending are in JSON format
        const message = JSON.parse(event.data.toString());
        if (message.type === 'transcript' && message.data.is_final) {
          console.log(`${message.data.id}: ${message.data.utterance.text}`)
        }
      });
      ```
    </CodeGroup>

    <Info>
      **Need low-latency partial results?**

      Enable [partial transcripts](/chapters/live-stt/features/partial-transcripts) by setting `messages_config.receive_partial_transcripts: true`.

      Use the `is_final` property to distinguish between partial and final transcript messages.
    </Info>

    ## Sending multiple audio tracks in real-time

    If you have multiple audio sources (like different participants in a conversation) that you need to transcribe simultaneously, you can merge these separate audio tracks into a single multi-channel audio stream and send it over one WebSocket connection.

    ### Merging multiple audio tracks into one multi-channel WebSocket

    This approach allows you to consolidate multiple audio tracks from different participants into a single WebSocket connection while maintaining the ability to identify each speaker through their dedicated channel.

    Benefits:

    * Reduce the number of WebSocket connections from multiple to just one
    * Maintain speaker identity through channel mapping
    * Simplify synchronization of audio streams from multiple participants
    * Reduce network overhead and connection management complexity

    #### Creating a multi-channel audio stream

    To combine multiple audio tracks into a single multi-channel stream, you need to interleave the audio samples. Here's a TypeScript function that merges multiple audio buffers into a single multi-channel buffer:

    <CodeGroup>
      ```typescript TypeScript theme={"system"}
      export function interleaveAudio(channelsData: Buffer[], bitDepth = 16): Buffer {
        const nbChannels = channelsData.length;
        if (nbChannels === 1) {
          return channelsData[0];
        }

        const bytesPerSample = bitDepth / 8;
        const samplesPerChannel = channelsData[0].byteLength / bytesPerSample;
        const audio = Buffer.alloc(nbChannels * samplesPerChannel * bytesPerSample);

        for (let i = 0; i < samplesPerChannel; i++) {
          for (let j = 0; j < nbChannels; j++) {
            const sample = channelsData[j].subarray(
              i * bytesPerSample,
              (i + 1) * bytesPerSample
            );
            audio.set(sample, (i * nbChannels + j) * bytesPerSample);
          }
        }

        return audio;
      }
      ```
    </CodeGroup>

    #### Example use case

    Consider a scenario with three participants in a room: Sami, Maxime, and Mark. Instead of opening three separate WebSocket connections (one for each participant), you can merge their audio tracks and send them over a single WebSocket:

    1. Collect audio buffers from each participant
    2. Merge them into a single multi-channel audio stream using the `interleaveAudio` function
    3. Specify the number of channels in your API configuration (3 in this case)
    4. Send the combined audio over a single WebSocket

    <CodeGroup>
      ```typescript TypeScript theme={"system"}
      // Collect audio buffers from each participant
      const samiAudio = getSamiAudioBuffer();
      const maximeAudio = getMaximeAudioBuffer();
      const markAudio = getMarkAudioBuffer();

      // Merge into a multi-channel audio
      // Channel ordering: [0]=Sami, [1]=Maxime, [2]=Mark
      const channelsData = [samiAudio, maximeAudio, markAudio];
      const mergedAudio = interleaveAudio(channelsData, 16); // 16-bit depth

      // Initialize a single WebSocket session with multi-channel config
      const response = await fetch("https://api.gladia.io/v2/live", {
        method: "POST",
        headers: {
          "Content-Type": "application/json",
          "x-gladia-key": "<YOUR_GLADIA_API_KEY>",
        },
        body: JSON.stringify({
          encoding: "wav/pcm",
          sample_rate: 16000,
          bit_depth: 16,
          channels: 3, // Specify the number of channels
        }),
      });

      const { url } = await response.json();
      const socket = new WebSocket(url);

      // Send the merged audio over a single WebSocket
      socket.addEventListener("open", function () {
        socket.send(mergedAudio);
      });
      ```
    </CodeGroup>

    #### Understanding the response

    When you send a multi-channel audio stream to Gladia, the channel order is preserved in the transcription results. Each transcription message will include a `channel` field that indicates which audio channel (and thus which participant) the transcription belongs to:

    ```json theme={"system"}
    {
      "type": "transcript",
      "session_id": "de70f43f-3041-46e0-892c-8e7f53800a22",
      "created_at": "2025-04-09T08:44:16.471Z",
      "data": {
        "id": "00_00000000",
        "utterance": {
          "text": "Hello, I'm Sami. I'm the first speaker",
          "start": 0.188,
          "end": 2.852,
          "language": "en",
          "channel": 0 // This indicates the first channel (Sami)
        }
      }
    }
    ```

    ```json theme={"system"}
    {
      "type": "transcript",
      "session_id": "de70f43f-3041-46e0-892c-8e7f53800a22",
      "created_at": "2025-04-09T08:44:19.693Z",
      "data": {
        "id": "01_00000000",
        "utterance": {
          "text": "And this is Maxime, nice to meet you, I am the second speaker.",
          "start": 3.468,
          "end": 6.132,
          "language": "en",
          "channel": 1 // This indicates the second channel (Maxime)
        }
      }
    }
    ```

    ```json theme={"system"}
    {
      "type": "transcript",
      "session_id": "a587386c-8755-4c67-ad67-d2c304eb8a49",
      "created_at": "2025-04-09T08:56:16.370Z",
      "data": {
        "id": "00_00000002",
        "utterance": {
          "text": "And this is Mark",
          "start": 8.614,
          "end": 10.574,
          "language": "en",
          "channel": 2 // This indicates the third channel (Mark)
        }
      }
    }
    ```

    The channel numbers directly correspond to the order in which you added the audio tracks to the `channelsData` array:

    * Channel 0 → Sami (first in the array)
    * Channel 1 → Maxime (second in the array)
    * Channel 2 → Mark (third in the array)

    <Warning>
      Remember to keep track of channel assignments in your application to properly
      attribute transcriptions to the correct participants.
    </Warning>

    <Warning>
      As mentioned in the [Multiple
      channels](/chapters/limits-and-specifications/multiple-channels) section, transcribing
      a multi-channel audio stream will be billed based on the total duration
      multiplied by the number of channels.
    </Warning>

    ## Read messages

    During the whole session, we will send various messages through the WebSocket, the callback URL or webhooks. You can specify which kind of messages you want to receive in the initial configuration. See for WebSocket messages and for callback messages.

    Here's an example of how to read a message received through a WebSocket:

    <CodeGroup>
      ```javascript JavaScript theme={"system"}
      socket.addEventListener("message", function (event) {
        // All the messages we are sending are in JSON format
        const message = JSON.parse(event.data.toString());
        if (message.type === "transcript" && message.data.is_final) {
          console.log(`${message.data.id}: ${message.data.utterance.text}`);
        }
      });
      ```
    </CodeGroup>

    ## Stop the recording

    Once you're done, send us the `stop_recording` message. We will process remaining audio chunks and start the post-processing phase, in which we put together the final audio file and results with the add-ons you requested.

    You'll receive a message at every step of the process in the WebSocket, or in the callback if configured. Once the post-processing is done, the WebSocket is closed with a code 1000.

    <CodeGroup>
      ```javascript JavaScript theme={"system"}
      socket.send(
        JSON.stringify({
          type: "stop_recording",
        })
      );
      ```
    </CodeGroup>

    <Note>
      Instead of sending the `stop_recording` message, you can also close the WebSocket with the code 1000.
      We will still do the post-processing in background and send you the messages through the callback you defined.

      <CodeGroup>
        ```javascript JavaScript theme={"system"}
        socket.close(1000);
        ```
      </CodeGroup>
    </Note>

    ## Get the final results

    If you want to get the complete result, you can call the [`GET /v2/live/:id` endpoint](/api-reference/v2/live/get) with the `id` you received from the initial request.

    <CodeGroup>
      ```javascript JavaScript theme={"system"}
      const response = await fetch(`https://api.gladia.io/v2/live/${id}`, {
        method: "GET",
        headers: {
          "x-gladia-key": "<YOUR_GLADIA_API_KEY>",
        },
      });
      if (!response.ok) {
        // Look at the error message
        // It might be a configuration issue
        console.error(
          `${response.status}: ${(await response.text()) || response.statusText}`
        );
        return;
      }

      const result = await response.json();
      console.log(result);
      ```

      ```bash cURL theme={"system"}
      curl --request GET \
        --url https://api.gladia.io/v2/live/ID_OF_THE_SESSION \
        --header 'x-gladia-key: YOUR_GLADIA_API_KEY'
      ```
    </CodeGroup>
  </Tab>
</Tabs>

<Tip>
  Want to know more about a specific feature? Check out our [Features chapter](/chapters/live-stt/features) for more details.
</Tip>

## Full code sample

You can find complete code samples in our Github repository:

<CardGroup cols={3}>
  <Card title="Typescript/Javascript" icon="js" href="https://github.com/gladiaio/gladia-samples/tree/main/typescript">
    {}
  </Card>

  <Card title="Python" icon="python" href="https://github.com/gladiaio/gladia-samples/tree/main/python">
    {}
  </Card>

  <Card title="Browser" icon="browser" href="https://github.com/gladiaio/gladia-samples/tree/main/javascript-browser">
    {}
  </Card>
</CardGroup>
