Partial transcripts provide a low-latency streaming transcription as words are spoken, offering immediate insights before the final, high-accuracy transcript is ready.
To enable partial transcripts, add the receive_partial_transcripts
property to the messages_config
object:
{
"encoding": "wav/pcm",
"sample_rate": 16000,
"bit_depth": 16,
"channels": 1,
"language_config": {
"languages": ["en"],
"code_switching": false
},
"messages_config": {
"receive_partial_transcripts": true,
"receive_final_transcripts": true
}
}
With this configuration, you will receive both partial transcripts as they are generated and the final, most accurate version of each utterance.
To reduce the total response time and create a more fluid user experience, partial transcripts use a faster, smaller model than the one used for final transcripts, trading a small amount of accuracy for a large gain in latency (< 100ms).
Partial transcripts accuracy deteriorates when multiple languages and/or code switching are enabled. For best results, limit the number of languages.
When receive_partial_transcripts
is true
, the real-time API will send transcript messages for both intermediate and final results.
To distinguish between them, the message payload includes the is_final
boolean field.
"is_final": false
: The message contains a partial transcript, which is subject to change.
"is_final": true
: The message contains the final, most accurate transcript for an utterance. This transcript will not change.
In the same utterance, the partial and final transcripts share the same data.id
.