Audio Transcription:
Args:
-
model (str): The AI model used for audio trancription (default: large-v2).
-
audio (audio): The file to transcript.
-
audio_url (url): The file url to transcript, ignored if an audio file is provided. This can be a public file url or any of the supported social platform listed in the documentation..
-
language_behaviour (enum): Define how the speaker's language will be detected (default: automatic single language).
-
language (enum): If language_behaviour is set to manual, define the language to use for the transcription.
-
toggle_noise_reduction (boolean): Activate the noise reduction to improve transcription quality (default: False).
-
transcription_hint (string): String to be fed to Whisper Model as textual context used during the inference. If empty, this argument is skipped. (default: ).
-
toggle_diarization (boolean): Activate the diarization of the audio (default: False).
-
toggle_direct_translate (boolean): Activate the direct translation of the audio transcription (default: False).
-
target_translation_language (enum): If toogle_direct_translate is set to true, define the language to use for the translation of the transcription.
-
toggle_text_emotion_recognition (boolean): Activate the emotion recognition of the audio transcription (default: False).
-
toggle_summarization (boolean): Activate the summarization of the audio transcription (default: False).
-
toggle_chapterization (boolean): Activate the chapterization of the audio transcription (default: False).
-
webhook_url (string): Webhook URL to send the result to. Make sure it network is open. (default: False).
-
diarization_max_speakers (integer): Guiding maximum number of speakers (10 at most) (default: 2).
-
output_format (enum): Define the output format, allowing to have Plain Text, Text, JSON, STR, VTT (subtitle file format for video content) (default: json).
Returns:
Dict: transcription