> ## Documentation Index
> Fetch the complete documentation index at: https://docs.gladia.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Initiate a session

> Initiate a live transcription WebSocket session.

Use the returned WebSocket url to connect to the WebSocket and start sending audio chunks. Use the returned `id` and the [GET /v2/live/:id](/api-reference/v2/live/get) endpoint to obtain the status and results.

<Accordion title="Why initiate with POST instead of connecting directly to the WebSocket?">
  * **Security**: Generate the WebSocket URL on your backend and keep your API key private. The init call returns a connectable URL and a session `id` that you can safely pass to web, iOS, or Android clients without exposing credentials in the app.
  * **Lower infrastructure load**: The secure URL is generated on your backend, the client can connect directly to Gladia's WebSocket server without a pass-through on your side, saving your own resources.
  * **Resilient reconnection and session continuity**: If the WebSocket disconnects (which can happen on unreliable networks), the session created by the init call lets the client reconnect without losing context. Traditional flows that open a socket first typically force a brand‑new session on disconnect, dropping in‑progress state.
</Accordion>


## OpenAPI

````yaml POST /v2/live
openapi: 3.1.0
info:
  title: Gladia Control API
  description: ''
  version: '1.0'
  contact: {}
servers:
  - url: https://api.gladia.io/
    description: Gladia API production URL
security: []
tags: []
paths:
  /v2/live:
    post:
      tags:
        - Live V2
      summary: Initiate a new live job
      operationId: StreamingController_initStreamingSession_v2
      parameters:
        - name: region
          required: false
          in: query
          description: The region used to process the audio.
          schema:
            $ref: '#/components/schemas/StreamingSupportedRegions'
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/StreamingRequest'
      responses:
        '201':
          description: The live job has been initiated
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/InitStreamingResponse'
        '400':
          description: Something is wrong with the request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/BadRequestErrorResponse'
        '401':
          description: You don't have the permissions to initiate a new live job
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/UnauthorizedErrorResponse'
        '422':
          description: The parameters you gave are incorrect
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/UnprocessableEntityErrorResponse'
      security:
        - x_gladia_key: []
components:
  schemas:
    StreamingSupportedRegions:
      type: string
      enum:
        - us-west
        - eu-west
    StreamingRequest:
      type: object
      properties:
        encoding:
          description: >-
            The encoding format of the audio stream. Supported formats: 

            - PCM: 8, 16, 24, and 32 bits 

            - A-law: 8 bits 

            - μ-law: 8 bits 


            Note: No need to add WAV headers to raw audio as the API supports
            both formats.
          default: wav/pcm
          allOf:
            - $ref: '#/components/schemas/StreamingSupportedEncodingEnum'
        bit_depth:
          description: The bit depth of the audio stream
          default: 16
          allOf:
            - $ref: '#/components/schemas/StreamingSupportedBitDepthEnum'
        sample_rate:
          description: The sample rate of the audio stream
          default: 16000
          allOf:
            - $ref: '#/components/schemas/StreamingSupportedSampleRateEnum'
        channels:
          type: integer
          description: The number of channels of the audio stream
          default: 1
          minimum: 1
          maximum: 8
        custom_metadata:
          type: object
          description: Custom metadata you can attach to this live transcription
          example:
            user: John Doe
          additionalProperties: true
        model:
          description: The model used to process the audio. "solaria-1" is used by default.
          default: solaria-1
          allOf:
            - $ref: '#/components/schemas/StreamingSupportedModels'
        endpointing:
          type: number
          description: >-
            The endpointing duration in seconds. Endpointing is the duration of
            silence which will cause an utterance to be considered as finished
          default: 0.05
          minimum: 0.01
          maximum: 10
        maximum_duration_without_endpointing:
          type: number
          description: >-
            The maximum duration in seconds without endpointing. If endpointing
            is not detected after this duration, current utterance will be
            considered as finished
          default: 5
          minimum: 5
          maximum: 60
        language_config:
          description: Specify the language configuration
          allOf:
            - $ref: '#/components/schemas/LanguageConfig'
        pre_processing:
          description: Specify the pre-processing configuration
          allOf:
            - $ref: '#/components/schemas/PreProcessingConfig'
        realtime_processing:
          description: Specify the realtime processing configuration
          allOf:
            - $ref: '#/components/schemas/RealtimeProcessingConfig'
        post_processing:
          description: Specify the post-processing configuration
          allOf:
            - $ref: '#/components/schemas/PostProcessingConfig'
        messages_config:
          description: Specify the websocket messages configuration
          allOf:
            - $ref: '#/components/schemas/MessagesConfig'
        callback:
          type: boolean
          description: If true, messages will be sent to configured url.
          default: false
        callback_config:
          description: Specify the callback configuration
          allOf:
            - $ref: '#/components/schemas/CallbackConfig'
    InitStreamingResponse:
      type: object
      properties:
        id:
          type: string
          description: Id of the job
          format: uuid
          example: 45463597-20b7-4af7-b3b3-f5fb778203ab
        created_at:
          type: string
          description: Creation date
          format: date-time
          example: '2023-12-28T09:04:17.210Z'
        url:
          type: string
          description: >-
            The websocket url to connect to for sending audio data. The url will
            contain the temporary token to authenticate the session.
          example: >-
            wss://api.gladia.io/v2/live?token=4a39145c-2844-4557-8f34-34883f7be7d9
          format: uri
      required:
        - id
        - created_at
        - url
    BadRequestErrorResponse:
      type: object
      properties:
        timestamp:
          type: string
          description: Date of when the error occurred
          example: '2023-12-28T09:04:17.210Z'
        path:
          type: string
          description: Path to the API endpoint
          example: /v2/transcription/45463597-20b7-4af7-b3b3-f5fb778203ab
        request_id:
          type: string
          description: Debug id
          example: G-821fe9df
        statusCode:
          type: number
          description: HTTP status code of the error
          example: 400
        message:
          type: string
          description: Error message
          example: Content-Type is missing Multipart Boundary.
        validation_errors:
          description: List of validation errors, if any
          example:
            - Field "language" must be a string
            - Field "min_speakers" must be a number
          type: array
          items:
            type: string
      required:
        - timestamp
        - path
        - request_id
        - statusCode
        - message
    UnauthorizedErrorResponse:
      type: object
      properties:
        timestamp:
          type: string
          description: Date of when the error occurred
          example: '2023-12-28T09:04:17.210Z'
        path:
          type: string
          description: Path to the API endpoint
          example: /v2/transcription/45463597-20b7-4af7-b3b3-f5fb778203ab
        request_id:
          type: string
          description: Debug id
          example: G-821fe9df
        statusCode:
          type: number
          description: HTTP status code of the error
          example: 401
        message:
          type: string
          description: Error message
          example: gladia key not found
      required:
        - timestamp
        - path
        - request_id
        - statusCode
        - message
    UnprocessableEntityErrorResponse:
      type: object
      properties:
        timestamp:
          type: string
          description: Date of when the error occurred
          example: '2023-12-28T09:04:17.210Z'
        path:
          type: string
          description: Path to the API endpoint
          example: /v2/transcription/45463597-20b7-4af7-b3b3-f5fb778203ab
        request_id:
          type: string
          description: Debug id
          example: G-821fe9df
        statusCode:
          type: number
          description: HTTP status code of the error
          example: 422
        message:
          type: string
          description: Error message
          example: Invalid parameter
      required:
        - timestamp
        - path
        - request_id
        - statusCode
        - message
    StreamingSupportedEncodingEnum:
      type: string
      enum:
        - wav/pcm
        - wav/alaw
        - wav/ulaw
      description: >-
        The encoding format of the audio stream. Supported formats: 

        - PCM: 8, 16, 24, and 32 bits 

        - A-law: 8 bits 

        - μ-law: 8 bits 


        Note: No need to add WAV headers to raw audio as the API supports both
        formats.
    StreamingSupportedBitDepthEnum:
      type: number
      enum:
        - 8
        - 16
        - 24
        - 32
      description: The bit depth of the audio stream
    StreamingSupportedSampleRateEnum:
      type: number
      enum:
        - 8000
        - 16000
        - 32000
        - 44100
        - 48000
      description: The sample rate of the audio stream
    StreamingSupportedModels:
      type: string
      enum:
        - solaria-1
      description: The model used to process the audio. "solaria-1" is used by default.
    LanguageConfig:
      type: object
      properties:
        languages:
          type: array
          description: >-
            If one language is set, it will be used for the transcription.
            Otherwise, language will be auto-detected by the model.
          default: []
          items:
            $ref: '#/components/schemas/TranscriptionLanguageCodeEnum'
        code_switching:
          type: boolean
          description: >-
            If true, language will be auto-detected on each utterance.
            Otherwise, language will be auto-detected on first utterance and
            then used for the rest of the transcription. If one language is set,
            this option will be ignored.
          default: false
    PreProcessingConfig:
      type: object
      properties:
        audio_enhancer:
          type: boolean
          description: >-
            If true, apply pre-processing to the audio stream to enhance the
            quality.
          default: false
        speech_threshold:
          type: number
          description: >-
            Sensitivity configuration for Speech Threshold. A value close to 1
            will apply stricter thresholds, making it less likely to detect
            background sounds as speech.
          default: 0.6
          minimum: 0
          maximum: 1
    RealtimeProcessingConfig:
      type: object
      properties:
        custom_vocabulary:
          type: boolean
          description: If true, enable custom vocabulary for the transcription.
          default: false
        custom_vocabulary_config:
          description: Custom vocabulary configuration, if `custom_vocabulary` is enabled
          allOf:
            - $ref: '#/components/schemas/CustomVocabularyConfigDTO'
        custom_spelling:
          type: boolean
          description: If true, enable custom spelling for the transcription.
          default: false
        custom_spelling_config:
          description: Custom spelling configuration, if `custom_spelling` is enabled
          allOf:
            - $ref: '#/components/schemas/CustomSpellingConfigDTO'
        translation:
          type: boolean
          description: If true, enable translation for the transcription
          default: false
        translation_config:
          description: Translation configuration, if `translation` is enabled
          allOf:
            - $ref: '#/components/schemas/TranslationConfigDTO'
        named_entity_recognition:
          type: boolean
          description: If true, enable named entity recognition for the transcription.
          default: false
        sentiment_analysis:
          type: boolean
          description: If true, enable sentiment analysis for the transcription.
          default: false
    PostProcessingConfig:
      type: object
      properties:
        summarization:
          type: boolean
          description: If true, generates summarization for the whole transcription.
          default: false
        summarization_config:
          description: Summarization configuration, if `summarization` is enabled
          allOf:
            - $ref: '#/components/schemas/SummarizationConfigDTO'
        chapterization:
          type: boolean
          description: If true, generates chapters for the whole transcription.
          default: false
    MessagesConfig:
      type: object
      properties:
        receive_partial_transcripts:
          type: boolean
          description: If true, partial transcript will be sent to websocket.
          default: false
        receive_final_transcripts:
          type: boolean
          description: If true, final transcript will be sent to websocket.
          default: true
        receive_speech_events:
          type: boolean
          description: If true, begin and end speech events will be sent to websocket.
          default: true
        receive_pre_processing_events:
          type: boolean
          description: If true, pre-processing events will be sent to websocket.
          default: true
        receive_realtime_processing_events:
          type: boolean
          description: If true, realtime processing events will be sent to websocket.
          default: true
        receive_post_processing_events:
          type: boolean
          description: If true, post-processing events will be sent to websocket.
          default: true
        receive_acknowledgments:
          type: boolean
          description: If true, acknowledgments will be sent to websocket.
          default: true
        receive_errors:
          type: boolean
          description: If true, errors will be sent to websocket.
          default: true
        receive_lifecycle_events:
          type: boolean
          description: If true, lifecycle events will be sent to websocket.
          default: false
    CallbackConfig:
      type: object
      properties:
        url:
          type: string
          description: URL on which we will do a `POST` request with configured messages
          example: https://callback.example
          format: uri
        receive_partial_transcripts:
          type: boolean
          description: If true, partial transcript will be sent to the defined callback.
          default: false
        receive_final_transcripts:
          type: boolean
          description: If true, final transcript will be sent to the defined callback.
          default: true
        receive_speech_events:
          type: boolean
          description: >-
            If true, begin and end speech events will be sent to the defined
            callback.
          default: false
        receive_pre_processing_events:
          type: boolean
          description: If true, pre-processing events will be sent to the defined callback.
          default: true
        receive_realtime_processing_events:
          type: boolean
          description: >-
            If true, realtime processing events will be sent to the defined
            callback.
          default: true
        receive_post_processing_events:
          type: boolean
          description: >-
            If true, post-processing events will be sent to the defined
            callback.
          default: true
        receive_acknowledgments:
          type: boolean
          description: If true, acknowledgments will be sent to the defined callback.
          default: false
        receive_errors:
          type: boolean
          description: If true, errors will be sent to the defined callback.
          default: false
        receive_lifecycle_events:
          type: boolean
          description: If true, lifecycle events will be sent to the defined callback.
          default: true
    TranscriptionLanguageCodeEnum:
      type: string
      enum:
        - af
        - am
        - ar
        - as
        - az
        - ba
        - be
        - bg
        - bn
        - bo
        - br
        - bs
        - ca
        - cs
        - cy
        - da
        - de
        - el
        - en
        - es
        - et
        - eu
        - fa
        - fi
        - fo
        - fr
        - gl
        - gu
        - ha
        - haw
        - he
        - hi
        - hr
        - ht
        - hu
        - hy
        - id
        - is
        - it
        - ja
        - jw
        - ka
        - kk
        - km
        - kn
        - ko
        - la
        - lb
        - ln
        - lo
        - lt
        - lv
        - mg
        - mi
        - mk
        - ml
        - mn
        - mr
        - ms
        - mt
        - my
        - ne
        - nl
        - nn
        - 'no'
        - oc
        - pa
        - pl
        - ps
        - pt
        - ro
        - ru
        - sa
        - sd
        - si
        - sk
        - sl
        - sn
        - so
        - sq
        - sr
        - su
        - sv
        - sw
        - ta
        - te
        - tg
        - th
        - tk
        - tl
        - tr
        - tt
        - uk
        - ur
        - uz
        - vi
        - yi
        - yo
        - zh
      description: >-
        Specify the language in which it will be pronounced when sound
        comparison occurs. Default to transcription language.
    CustomVocabularyConfigDTO:
      type: object
      properties:
        vocabulary:
          type: array
          description: >-
            Specific vocabulary list to feed the transcription model with. Each
            item can be a string or an object with the following properties:
            value, intensity, pronunciations, language.
          example:
            - Westeros
            - value: Stark
            - value: Night's Watch
              pronunciations:
                - Nightz Watch
              intensity: 0.4
              language: en
          items:
            oneOf:
              - $ref: '#/components/schemas/CustomVocabularyEntryDTO'
              - type: string
        default_intensity:
          type: number
          description: Default intensity for the custom vocabulary
          example: 0.5
          minimum: 0
          maximum: 1
      required:
        - vocabulary
    CustomSpellingConfigDTO:
      type: object
      properties:
        spelling_dictionary:
          type: object
          description: The list of spelling applied on the audio transcription
          example:
            Gettleman:
              - gettleman
            SQL:
              - Sequel
          additionalProperties:
            type: array
            items:
              type: string
      required:
        - spelling_dictionary
    TranslationConfigDTO:
      type: object
      properties:
        target_languages:
          type: array
          description: >-
            Target language in `iso639-1` format you want the transcription
            translated to
          example:
            - en
          minItems: 1
          items:
            $ref: '#/components/schemas/TranslationLanguageCodeEnum'
        model:
          description: Model you want the translation model to use to translate
          default: base
          allOf:
            - $ref: '#/components/schemas/TranslationModelEnum'
        match_original_utterances:
          type: boolean
          description: Align translated utterances with the original ones
          default: true
        lipsync:
          type: boolean
          description: 'Whether to apply lipsync to the translated transcription. '
          default: true
        context_adaptation:
          type: boolean
          description: >-
            Enables or disables context-aware translation features that allow
            the model to adapt translations based on provided context.
          default: true
        context:
          type: string
          description: Context information to improve translation accuracy
        informal:
          type: boolean
          description: >-
            Forces the translation to use informal language forms when available
            in the target language.
          default: false
      required:
        - target_languages
    SummarizationConfigDTO:
      type: object
      properties:
        type:
          description: The type of summarization to apply
          default: general
          allOf:
            - $ref: '#/components/schemas/SummaryTypesEnum'
    CustomVocabularyEntryDTO:
      type: object
      properties:
        value:
          type: string
          description: The text used to replace in the transcription.
          example: Gladia
        intensity:
          type: number
          description: The global intensity of the feature.
          example: 0.5
          minimum: 0
          maximum: 1
        pronunciations:
          description: The pronunciations used in the transcription.
          type: array
          items:
            type: string
        language:
          description: >-
            Specify the language in which it will be pronounced when sound
            comparison occurs. Default to transcription language.
          example: en
          allOf:
            - $ref: '#/components/schemas/TranscriptionLanguageCodeEnum'
      required:
        - value
    TranslationLanguageCodeEnum:
      type: string
      enum:
        - af
        - am
        - ar
        - as
        - az
        - ba
        - be
        - bg
        - bn
        - bo
        - br
        - bs
        - ca
        - cs
        - cy
        - da
        - de
        - el
        - en
        - es
        - et
        - eu
        - fa
        - fi
        - fo
        - fr
        - gl
        - gu
        - ha
        - haw
        - he
        - hi
        - hr
        - ht
        - hu
        - hy
        - id
        - is
        - it
        - ja
        - jw
        - ka
        - kk
        - km
        - kn
        - ko
        - la
        - lb
        - ln
        - lo
        - lt
        - lv
        - mg
        - mi
        - mk
        - ml
        - mn
        - mr
        - ms
        - mt
        - my
        - ne
        - nl
        - nn
        - 'no'
        - oc
        - pa
        - pl
        - ps
        - pt
        - ro
        - ru
        - sa
        - sd
        - si
        - sk
        - sl
        - sn
        - so
        - sq
        - sr
        - su
        - sv
        - sw
        - ta
        - te
        - tg
        - th
        - tk
        - tl
        - tr
        - tt
        - uk
        - ur
        - uz
        - vi
        - wo
        - yi
        - yo
        - zh
      description: >-
        Target language in `iso639-1` format you want the transcription
        translated to
    TranslationModelEnum:
      type: string
      enum:
        - base
        - enhanced
      description: Model you want the translation model to use to translate
    SummaryTypesEnum:
      type: string
      enum:
        - general
        - bullet_points
        - concise
      description: The type of summarization to apply
  securitySchemes:
    x_gladia_key:
      type: apiKey
      in: header
      name: x-gladia-key
      description: Your personal Gladia API key

````