Endpointing is the mechanism Gladia uses in live transcription to decide when a speaker has “finished” an utterance, so the API can close that utterance and emit a final transcript segment. In practice, endpointing answers the question: “How much silence should we wait before we consider the sentence (or turn) complete?”Documentation Index
Fetch the complete documentation index at: https://docs.gladia.io/llms.txt
Use this file to discover all available pages before exploring further.
Why endpointing matters
Endpointing is one of the main knobs that controls the tradeoff between:- Latency (speed): how quickly you get final utterances
- Completeness: whether you avoid cutting someone off mid-thought
- Chunking quality: whether utterances align well with natural turns or sentences
How it works conceptually
During a live session, Gladia continuously analyzes the incoming audio stream and:- Detects speech activity on each channel (voice activity detection)
- Groups speech into an “utterance” while speech is ongoing
- When it observes silence lasting at least endpointing seconds, it considers the utterance finished and closes it (finalizes it).
- The AI model is then used to transcribe the final result of the utterance.
- If speech never pauses long enough, Gladia still has a safety mechanism to close the utterance (maximum_duration_without_endpointing, see next section)
The 2 key parameters
endpointing (seconds)Definition: the duration of silence that closes the current utterance.
- Default: 0.05
- Range: 0.01 to 10
- Smaller value = closes utterances faster, but can split sentences if the speaker hesitates briefly.
- Larger value = waits longer before finalizing, which improves segment completeness but increases latency.
- Default: 5
- Range: 5 to 60