The challenge in virtual meetings
A typical meeting scenario for companies with international or multilingual employees, or for sales and customer support calls, will often start with a few people speaking in one language, usually their native tongue. At some point, one or more additional speakers will join the call, and everyone will switch to a common language from that point onwards. These scenarios pose a few challenges - correctly detecting the language, understanding who the different speakers are and what each said, and getting results in the required latency.Recommended configuration
-
Multilingual support
tldr: The best mode for asynchronously transcribing meeting recordings is ‘automatic single language’. It will correctly detect the main language of the meeting and transcribe accordingly while automatically translating any parts spoken in secondary languages.
- Manual - specify a single language for the audio to be transcribed to.The resulting transcription will be in the specified language, regardless of how many languages were spoken in the audio.Any speech in languages other than the one configured, will be translated automatically.
⚙️ Use for: Cases where a specific pre-known language is needed for the transcript.
⚠️ Limitations: Other languages will be automatically translated to the specified language. - Automatic single language - For asynchronous audio, this behaviour will automatically detect the most prominent language in the audio and use it to transcribe.Segments of speech in other languages will automatically be translated. For live streaming, language will be automatically detected based on the first chunk of audio and will be used for the remainder of the stream.
⚙️ Use for: Most common use cases, including meetings that start in one language and switch to another.
⚠️ Limitations: Secondary languages will be automatically translated to the main language (the one most spoken); Real-time transcription will detect based on the first utterance only. - Automatic multiple languages (code-switching) - This mode will continuously detect the language spoken and support multiple language changes in a single audio.The resulting transcription will be in multiple languages.
⚙️ Use for: Use cases with languages being changed multiple times in the audio, e.g. a journalist interviewing someone in one language and getting responses in another.
⚠️ Limitations: Currently sensitive to strong accents, which may cause the wrong language to be detected.
-
When to use Real-time vs asynchronous transcription
Tl;dr: For most needs, using async transcription will provide the best balance between price, accuracy, and speed. For cases where real-time or very low latency transcription is needed, use live transcription.
-
Speaker diarization
Tl;dr: When using the API of a virtual meeting service or recording bot, it’s better to rely on their built-in mechanical diarization API, based on separate audio sources. If no such API is available, Gladia provides a high-quality audio-based diarization.
-
Importance of sync and accurate timestamps
Tl;dr: Many advanced features, such as aligning speaker separation and transcription, audio-based diarization, and sending transcriptions to LLMs, require extremely accurate word-level timestamps. Luckily, Gladia provides one of the most accurate timestamp alignment solutions in the market, including for live transcription.
Summary
Navigating the various transcription features and options and finding the optimal configuration for your meeting recording product can be tricky. We’ve gathered the experience from multiple companies with similar needs and challenges and combined the knowledge into the guidelines in this post. We hope it will help you build a better product while significantly shortening the development time.Partners
Gladia is supported by multiple Meeting recording Partners:- Recall.ai: check our integration guide (https://docs.gladia.io/chapters/partners-integrations/pages/meeting-recorders/recall-ai)
- Meeting BAAS : check our integration guide (https://docs.gladia.io/chapters/partners-integrations/pages/meeting-recorders/meeting-baas)