These recommendations apply to the Realtime API and can be passed during
session initialization. They are starting points — feel free to
fine-tune them to your needs.
Voice Agents
For callbots, customer service assistants, or voice-driven chatbots, the top priority is low latency. The agent should respond quickly, even if sentence boundaries are not perfect. Recommended parameters:endpointing
: 0.05 - 0.1
Keeps conversations snappy by closing utterances quickly.maximum_duration_without_endpointing
: 15s
Prevents very long utterances from staying open, without cutting off the conversation.messages_config.partial_transcripts
: true
Enables interim results for early reactions. For that, you can use the speech_stop event to know when the user has stopped speaking.language_config.language
: fixed if known
Skips auto-detection for faster response.
This setup is best when fast turn-taking is essential!
Meeting Recorders
For meetings, lectures, and conferences, the focus shifts to readability and completeness. Latency is less critical than producing well-punctuated, accurate transcripts. Recommended parameters:endpointing
: 0.3 - 0.5
Captures sentences fully before closing.maximum_duration_without_endpointing
: 60s
Allows for longer interventions or presentations.messages_config.partial_transcripts
:true
Shows live transcripts while waiting for finals.realtime_processing.custom_vocabulary
: add company-specific terms
Ensures correct spelling of jargon, acronyms, or product names.
Subtitles / Captioning
When providing live subtitles, the goal is to sync text with the speaker. For post-production subtitles, readability and sentence integrity matter more. Recommended parameters:endpointing
:0.3
→ for live captions (minimal lag)0.8
→ for clean subtitles (post-production or recordings)
maximum_duration_without_endpointing
: 30s
Prevents excessively long subtitle blocks.messages_config.partial_transcripts
: true
Shows words as they’re spoken, then refines them.