> ## Documentation Index > Fetch the complete documentation index at: https://docs.gladia.io/llms.txt > Use this file to discover all available pages before exploring further. # Live transcription - Migration guide from V1 to V2 > Migrate to the latest version of Gladia's real-time speech-to-text API Live transcription V2 is the latest real-time speech-to-text API from Gladia. It offers more features and has significant improvements in latency compared to V1. Here is a guide on how to migrate to V2, so you can start enjoying all the benefits. Please make sure you migrate sooner rather than later as we're looking to remove support for V1 sometime in the future. Before we do so however, we'll of course reach out to those of you who are still on V1. ## Initiating the connection to the WebSocket In V1, you always connect to the same WebSocket URL (wss\://api.gladia.io/audio/text/audio-transcription) and send your configuration through the WebSocket connection. In V2, you first generate a unique WebSocket URL with a call to our [POST /v2/live endpoint](/api-reference/v2/live/init), and then connect to it. This URL contains a token that is unique to your live session. You'll be able to resume your session in case of a lost connection, or give the URL to a web client without exposing your Gladia API key. ```javascript JavaScript theme={"system"} import WebSocket from 'ws'; const socket = new WebSocket('wss://api.gladia.io/audio/text/audio-transcription'); socket.addEventListener("open", function() { // Send configuration socket.send(JSON.stringify({ 'x_gladia_key': 'YOUR_GLADIA_API_KEY', // ...config properties })) // Start sending audio chunks }); ``` ```javascript JavaScript theme={"system"} import WebSocket from 'ws'; const response = await fetch('https://api.gladia.io/v2/live', { method: 'POST', headers: { 'Content-Type': 'application/json', 'x-gladia-key': '', }, body: JSON.stringify({ // ...config properties }), }); if (!response.ok) { // Look at the error message // It might be a configuration issue console.error(`${response.status}: ${(await response.text()) || response.statusText}`); process.exit(response.status); } const {url} = await response.json(); const socket = new WebSocket(url); socket.addEventListener("open", function() { // Start sending audio chunks }); ``` ## Configuration With V2 offering more features, the configuration comes with some changes. You'll find the full configuration definition in the [POST /v2/live API reference page](/api-reference/v2/live/init). Here, we'll show you how to migrate your V1 configuration object to the V2 one. ### Audio encoding `encoding`, `bit_depth` and `sample_rate` are still present in V2, but with less options for now. As `wav` is the same `encoding` as `wav/pcm`, V2 has dropped support for `wav` and defaults to `wav/pcm`. `amb`, `mp3`, `flac`, `ogg/vorbis`, `opus`, `sphere` and `amr-nb` are no longer supported. `bit_depth` option `64` is no longer supported. If you're using an unsupported `encoding` or `bit_depth`, please contact us with your use case. In the mean time, keep using V1. ### Model Only one model is supported in V2 for now, so omit the property `model`. ### End-pointing and maximum audio duration `endpointing` is now declared in seconds instead of milliseconds. `maximum_audio_duration` has been renamed to `maximum_duration_without_endpointing`. ```json theme={"system"} { "endpointing": 800, "maximum_audio_duration": 10 } ``` ```json theme={"system"} { "endpointing": 0.8, "maximum_duration_without_endpointing": 10 } ``` ### Language #### Automatic single language Automatic single language behavior is the default in both V1 and V2, so you can just omit those parameters from your configuration. ```json theme={"system"} { "language_behaviour": "automatic single language" } ``` ```json theme={"system"} { "language_config": { "languages": [], "code_switching": false } } ``` #### Automatic multiple languages ```json theme={"system"} { "language_behaviour": "automatic multiple languages" } ``` ```json theme={"system"} { "language_config": { "languages": [], // You can now specify the expected languages in V2 as guidance to improve accuracy and latency "code_switching": true } } ``` #### Manual Languages are now specified with a 2-letter code, as in the API for asynchronous speech-to-text.
See [this page](/chapters/language/supported-languages) for a complete list of codes. ```json theme={"system"} { "language_behaviour": "manual", "language": "english" } ``` ```json theme={"system"} { "language_config": { "languages": ["en"], "code_switching": false } } ``` ### Frames format You can send audio chunk as bytes or base64 and we'll detect the format automatically. The parameter `frames_format` is no longer present. ### Audio enhancer `audio_enhancer` has been moved into the `pre_processing` object. ```json theme={"system"} { "audio_enhancer": true } ``` ```json theme={"system"} { "pre_processing": { "audio_enhancer": true } } ``` ### Word timestamps `word_timestamps` has been renamed to `words_accurate_timestamps` and moved into the `realtime_processing` object. ```json theme={"system"} { "word_timestamps": true } ``` ```json theme={"system"} { "realtime_processing": { "words_accurate_timestamps": true } } ``` ### Other properties `prosody`, `reinject_context` and `transcription_hint` are not supported for now. They may return in another form in the future. ### Full config migration sample ```json theme={"system"} { "encoding": "wav", "bit_depth": 8, "sample_rate": 48000, "model": "accurate", "endpointing": 800, "maximum_audio_duration": 10, "language_behaviour": "manual", "language": "english", "audio_enhancer": true, "word_timestamps": true } ``` ```json theme={"system"} { "encoding": "wav/pcm", "bit_depth": 8, "sample_rate": 48000, "endpointing": 0.8, "maximum_duration_without_endpointing": 10, "language_config": { "languages": ["en"] } "pre_processing": { "audio_enhancer": true }, "realtime_processing": { "words_accurate_timestamps": true } } ``` ## Send audio chunks If you were sending chunks as bytes, nothing has changed. If you were sending them as base64, the format of the JSON messages changed in V2. See the [API reference](/api-reference/v2/live/action/audio-chunk) for the full format. ```json theme={"system"} { "frames": "" } ``` ```json theme={"system"} { "type": "audio_chunk", "data": { "chunk": "" } } ``` ## Transcription message In V1, we only send two kinds of messages through WebSocket: * the "connected" message * the "transcript" messages In V2, we send more: * lifecycle event messages * acknowledgment messages * add-on messages * post-processing messages * ... To read a transcription message in V1, you verify that the `type` field is `"final"` and/or the `transcription` field is not empty.
In V2, you should confirm that the `type` field is `transcript` and that `data.is_final` is `true`. Below are examples of transcript messages in V1 and V2, so you can see the differences. See the [API reference](/api-reference/v2/live/message/transcript) for the full format. ```json theme={"system"} { "event": "transcript", "request_id": "G-3abade39", "type": "final", "transcription": " Hello world", "time_begin": 1.4376875, "time_end": 2.4696875, "confidence": 0.65, "language": "en", "utterances": [ { "transcription": " Hello world", "time_begin": 1.4376875, "time_end": 2.4696875, "language": "en", "confidence": 0.65, "stable": true, "id": 0 } ], "inference_time": 0.843909502029419, "duration": 2.5845000000000002 } ``` ```json theme={"system"} { "type": "transcript", "session_id": "de0a341d-c69f-4e15-a649-7b3f49e211f0", "created_at": "2024-10-10T14:35:28.387Z", "data": { "id": "00_00000000", "is_final": true, "utterance": { "text": " Hello world", "start": 0.188, "end": 1.284, "language": "en", "confidence": 1, "channel": 0, "words": [ { "word": " Hello", "start": 0.188, "end": 0.735, "confidence": 1 }, { "word": " world", "start": 0.736, "end": 1.284, "confidence": 1 } ] } } } ``` If you're not interested in new messages and simply want the ones from the V1 API, you can always configure what kind of messages you want when calling the [POST /v2/live endpoint](/api-reference/v2/live/init) to initiate the session. With the following configuration, you will only receive final transcript messages: ```json theme={"system"} { "messages_config": { "receive_partial_transcripts": false, "receive_final_transcripts": true, "receive_speech_events": false, "receive_pre_processing_events": false, "receive_realtime_processing_events": false, "receive_post_processing_events": false, "receive_acknowledgments": false, "receive_lifecycle_events": false } } ``` ## End the live session The format of this message also changed. See the [API reference](/api-reference/v2/live/action/stop-recording) for the full format. ```json theme={"system"} { "event": "terminate" } ``` ```json theme={"system"} { "type": "stop_recording" } ```