What is Emotion Recognition
The classification of emotions has been a topic of interest for psychologists for decades, and this research has provided valuable insights into defining distinct categories of emotion bridged by continuous gradients. Our emotion recognition system is built upon research from New York University, and aims to accurately identify the emotional state of individuals across 99 languages. This documentation explains how to integrate our system into your applications.
Description | Label |
---|---|
Admiration | admiration |
Adoration | adoration |
Aesthetic appreciation | aesthetic_appreciation |
Amusement | amusement |
Anger | anger |
Anxiety | anxiesty |
Awe | awe |
Awkwardness | awkwardness |
Boredom | borebdom |
Calmness | clamness |
Confusion | confusion |
Craving | craving |
Disgust | disgust |
Empathic pain | empathic_pain |
Entrancement | entrancement |
Excitement | excitement |
Fear | fear |
Horror | horror |
Happiness | happiness |
Interest | interest |
Joy | joy |
Nostalgia | nostalgia |
Relief | relief |
Romance | romance |
Sadness | sadness |
Satisfaction | satisfaction |
Sexual desire | sexual_desire |
Surprise | surprise |
Activate the Emotion Recognition
In the app
In the app Toggle the emotion recognition using the checkbox

Using the API
curl -X 'POST'
'<https://api.gladia.io/audio/text/audio-transcription/>'
-H 'accept: application/json'
-H 'x-gladia-key: XXXXXXXXXXXXXXX'
-H 'Content-Type: multipart/form-data'
-F "audio_url=<http://files.gladia.io/example/audio-transcription/split_infinity.wav>"
-F "output_format=json"
-F "toggle_text_emotion_recognition=true"
Output Format | Expected behavior | Example |
---|---|---|
json (default) | transcript is in "prediction" general emotion recognition based on the full transcription is in "prediction_raw.emotion" utterance based emotion recognition is in "prediction.utterance.emotion" and in "prediction_raw.utterance.emotion" | { "prediction": [ { "words": [{...}], "language": "en", "transcription": " There is always hope for the future. The future can be read from the past.", "time_begin": 8.618, "time_end": 14.138, "speaker": "not_activated", "channel": "channel_0", "emotion": "calmness" }, ], "prediction_raw": { "metadata": {...}, "transcription": [ { "words": [{...}], "language": "en", "transcription": " There is always hope for the future. The future can be read from the past.", "time_begin": 8.618, "time_end": 14.138, "speaker": "not_activated", "channel": "channel_0", "emotion": "calmness" }, ], "emotion": "awe" } } |
plain | Only transcript is returned in plain text, no emotion detection is performed | "There is always hope for the future. The future can be read from the past." |
txt | transcript is in "prediction" general emotion recognition based on the full transcription is in "prediction_raw.emotion" utterance based emotion recognition is in "prediction_raw.utterance.emotion" | { "prediction": "Split infinity in a time when less is more, where too much is never enough. There is always hope for the future. The future can be read from the past. The past foreshadows the present and the present hasn't been written yet.", "prediction_raw": { "transcription": [ { "words": [{...}], "language": "en", "transcription": " There is always hope for the future. The future can be read from the past.", "time_begin": 8.618, "time_end": 14.138, "speaker": "not_activated", "channel": "channel_0", "emotion": "calmness" }, ] "metadata": {...}, "emotion":"awe" } } |
vtt | transcript in VTT format is in "prediction" transcript in original format is in "prediction_raw.transcription" general emotion recognition based on the full transcription is in "prediction_raw.emotion" utterance based emotion recognition is in "prediction_raw.utterance.emotion" | { "prediction": "VTT Transcript", "prediction_raw": { "transcription": [ { "words": [{...}], "language": "en", "transcription": " There is always hope for the future. The future can be read from the past.", "time_begin": 8.618, "time_end": 14.138, "speaker": "not_activated", "channel": "channel_0", "emotion": "calmness" }, ] "metadata": {...}, "emotion":"awe" } } |
srt | transcript in SRT format is in "prediction" transcript in original format is in "prediction_raw.transcription" general emotion recognition based on the full transcription is in "prediction_raw.emotion" utterance based emotion recognition is in "prediction_raw.utterance.emotion" | { "prediction": "VTT Transcript", "prediction_raw": { "transcription": [ { "words": [{...}], "language": "en", "transcription": " There is always hope for the future. The future can be read from the past.", "time_begin": 8.618, "time_end": 14.138, "speaker": "not_activated", "channel": "channel_0", "emotion": "calmness" }, ] "metadata": {...}, "emotion":"awe" } } |
Full Examples
Plain output
"Hope remains despite the current minimalist atmosphere, as the past can guide us towards the future. Ultimately, it is up to us to shape the present."
JSON output
{
"prediction": [
{
"words": [
{
"word": " Split",
"time_begin": 1.1780000000000002,
"time_end": 1.8980000000000001,
"confidence": 0.49
},
{
"word": " infinity",
"time_begin": 1.8980000000000001,
"time_end": 1.538,
"confidence": 0.72
},
{
"word": " in",
"time_begin": 1.538,
"time_end": 2.618,
"confidence": 0.34
},
{
"word": " a",
"time_begin": 2.618,
"time_end": 2.8779999999999997,
"confidence": 1
},
{
"word": " time",
"time_begin": 2.8779999999999997,
"time_end": 3.318,
"confidence": 0.81
},
{
"word": " when",
"time_begin": 3.318,
"time_end": 3.778,
"confidence": 0.86
},
{
"word": " less",
"time_begin": 3.778,
"time_end": 4.058,
"confidence": 0.87
},
{
"word": " is",
"time_begin": 4.058,
"time_end": 4.378,
"confidence": 0.9
},
{
"word": " more,",
"time_begin": 4.378,
"time_end": 4.938,
"confidence": 0.88
},
{
"word": " where",
"time_begin": 5.638,
"time_end": 5.718,
"confidence": 0.89
},
{
"word": " too",
"time_begin": 5.718,
"time_end": 6.138,
"confidence": 0.8
},
{
"word": " much",
"time_begin": 6.138,
"time_end": 6.478,
"confidence": 0.81
},
{
"word": " is",
"time_begin": 6.478,
"time_end": 6.918,
"confidence": 0.9
},
{
"word": " never",
"time_begin": 6.918,
"time_end": 7.258,
"confidence": 0.88
},
{
"word": " enough.",
"time_begin": 7.258,
"time_end": 7.798,
"confidence": 0.78
}
],
"language": "en",
"transcription": " Split infinity in a time when less is more, where too much is never enough.",
"time_begin": 1.1780000000000002,
"time_end": 7.798,
"speaker": "not_activated",
"channel": "channel_0",
"emotion": "romance"
},
{
"words": [
{
"word": " There",
"time_begin": 8.618,
"time_end": 8.678,
"confidence": 0.8
},
{
"word": " is",
"time_begin": 8.678,
"time_end": 8.958,
"confidence": 0.89
},
{
"word": " always",
"time_begin": 8.958,
"time_end": 9.478000000000002,
"confidence": 0.76
},
{
"word": " hope",
"time_begin": 9.478000000000002,
"time_end": 9.778,
"confidence": 0.83
},
{
"word": " for",
"time_begin": 9.778,
"time_end": 10.118,
"confidence": 0.9
},
{
"word": " the",
"time_begin": 10.118,
"time_end": 10.358,
"confidence": 0.82
},
{
"word": " future.",
"time_begin": 10.358,
"time_end": 10.738000000000001,
"confidence": 0.94
},
{
"word": " The",
"time_begin": 11.738000000000001,
"time_end": 11.898000000000001,
"confidence": 0.81
},
{
"word": " future",
"time_begin": 11.898000000000001,
"time_end": 12.218,
"confidence": 0.94
},
{
"word": " can",
"time_begin": 12.218,
"time_end": 12.578000000000001,
"confidence": 0.9
},
{
"word": " be",
"time_begin": 12.578000000000001,
"time_end": 12.838000000000001,
"confidence": 0.91
},
{
"word": " read",
"time_begin": 12.838000000000001,
"time_end": 13.038,
"confidence": 0.9
},
{
"word": " from",
"time_begin": 13.038,
"time_end": 13.338000000000001,
"confidence": 0.82
},
{
"word": " the",
"time_begin": 13.338000000000001,
"time_end": 13.558000000000002,
"confidence": 0.82
},
{
"word": " past.",
"time_begin": 13.558000000000002,
"time_end": 14.138,
"confidence": 0.81
}
],
"language": "en",
"transcription": " There is always hope for the future. The future can be read from the past.",
"time_begin": 8.618,
"time_end": 14.138,
"speaker": "not_activated",
"channel": "channel_0",
"emotion": "calmness"
},
{
"words": [
{
"word": " The",
"time_begin": 14.658000000000001,
"time_end": 14.778,
"confidence": 0.81
},
{
"word": " past",
"time_begin": 14.778,
"time_end": 15.358,
"confidence": 0.82
},
{
"word": " foreshadows",
"time_begin": 15.358,
"time_end": 16.098,
"confidence": 0.89
},
{
"word": " the",
"time_begin": 16.098,
"time_end": 16.458,
"confidence": 0.81
},
{
"word": " present",
"time_begin": 16.458,
"time_end": 17.018,
"confidence": 0.79
},
{
"word": " and",
"time_begin": 17.018,
"time_end": 17.698,
"confidence": 0.33
},
{
"word": " the",
"time_begin": 17.698,
"time_end": 17.918,
"confidence": 0.81
},
{
"word": " present",
"time_begin": 17.918,
"time_end": 18.378,
"confidence": 0.79
},
{
"word": " hasn't",
"time_begin": 18.378,
"time_end": 18.918,
"confidence": 0.93
},
{
"word": " been",
"time_begin": 18.918,
"time_end": 19.218,
"confidence": 0.82
},
{
"word": " written",
"time_begin": 19.218,
"time_end": 19.458,
"confidence": 0.86
},
{
"word": " yet.",
"time_begin": 19.458,
"time_end": 19.977999999999998,
"confidence": 0.91
}
],
"language": "en",
"transcription": " The past foreshadows the present and the present hasn't been written yet.",
"time_begin": 14.658000000000001,
"time_end": 19.977999999999998,
"speaker": "not_activated",
"channel": "channel_0",
"emotion": "awe"
}
],
"prediction_raw": {
"transcription": [
{
"words": [
{
"word": " Split",
"time_begin": 1.1780000000000002,
"time_end": 1.8980000000000001,
"confidence": 0.49
},
{
"word": " infinity",
"time_begin": 1.8980000000000001,
"time_end": 1.538,
"confidence": 0.72
},
{
"word": " in",
"time_begin": 1.538,
"time_end": 2.618,
"confidence": 0.34
},
{
"word": " a",
"time_begin": 2.618,
"time_end": 2.8779999999999997,
"confidence": 1
},
{
"word": " time",
"time_begin": 2.8779999999999997,
"time_end": 3.318,
"confidence": 0.81
},
{
"word": " when",
"time_begin": 3.318,
"time_end": 3.778,
"confidence": 0.86
},
{
"word": " less",
"time_begin": 3.778,
"time_end": 4.058,
"confidence": 0.87
},
{
"word": " is",
"time_begin": 4.058,
"time_end": 4.378,
"confidence": 0.9
},
{
"word": " more,",
"time_begin": 4.378,
"time_end": 4.938,
"confidence": 0.88
},
{
"word": " where",
"time_begin": 5.638,
"time_end": 5.718,
"confidence": 0.89
},
{
"word": " too",
"time_begin": 5.718,
"time_end": 6.138,
"confidence": 0.8
},
{
"word": " much",
"time_begin": 6.138,
"time_end": 6.478,
"confidence": 0.81
},
{
"word": " is",
"time_begin": 6.478,
"time_end": 6.918,
"confidence": 0.9
},
{
"word": " never",
"time_begin": 6.918,
"time_end": 7.258,
"confidence": 0.88
},
{
"word": " enough.",
"time_begin": 7.258,
"time_end": 7.798,
"confidence": 0.78
}
],
"language": "en",
"transcription": " Split infinity in a time when less is more, where too much is never enough.",
"time_begin": 1.1780000000000002,
"time_end": 7.798,
"speaker": "not_activated",
"channel": "channel_0",
"emotion": "romance"
},
{
"words": [
{
"word": " There",
"time_begin": 8.618,
"time_end": 8.678,
"confidence": 0.8
},
{
"word": " is",
"time_begin": 8.678,
"time_end": 8.958,
"confidence": 0.89
},
{
"word": " always",
"time_begin": 8.958,
"time_end": 9.478000000000002,
"confidence": 0.76
},
{
"word": " hope",
"time_begin": 9.478000000000002,
"time_end": 9.778,
"confidence": 0.83
},
{
"word": " for",
"time_begin": 9.778,
"time_end": 10.118,
"confidence": 0.9
},
{
"word": " the",
"time_begin": 10.118,
"time_end": 10.358,
"confidence": 0.82
},
{
"word": " future.",
"time_begin": 10.358,
"time_end": 10.738000000000001,
"confidence": 0.94
},
{
"word": " The",
"time_begin": 11.738000000000001,
"time_end": 11.898000000000001,
"confidence": 0.81
},
{
"word": " future",
"time_begin": 11.898000000000001,
"time_end": 12.218,
"confidence": 0.94
},
{
"word": " can",
"time_begin": 12.218,
"time_end": 12.578000000000001,
"confidence": 0.9
},
{
"word": " be",
"time_begin": 12.578000000000001,
"time_end": 12.838000000000001,
"confidence": 0.91
},
{
"word": " read",
"time_begin": 12.838000000000001,
"time_end": 13.038,
"confidence": 0.9
},
{
"word": " from",
"time_begin": 13.038,
"time_end": 13.338000000000001,
"confidence": 0.82
},
{
"word": " the",
"time_begin": 13.338000000000001,
"time_end": 13.558000000000002,
"confidence": 0.82
},
{
"word": " past.",
"time_begin": 13.558000000000002,
"time_end": 14.138,
"confidence": 0.81
}
],
"language": "en",
"transcription": " There is always hope for the future. The future can be read from the past.",
"time_begin": 8.618,
"time_end": 14.138,
"speaker": "not_activated",
"channel": "channel_0",
"emotion": "calmness"
},
{
"words": [
{
"word": " The",
"time_begin": 14.658000000000001,
"time_end": 14.778,
"confidence": 0.81
},
{
"word": " past",
"time_begin": 14.778,
"time_end": 15.358,
"confidence": 0.82
},
{
"word": " foreshadows",
"time_begin": 15.358,
"time_end": 16.098,
"confidence": 0.89
},
{
"word": " the",
"time_begin": 16.098,
"time_end": 16.458,
"confidence": 0.81
},
{
"word": " present",
"time_begin": 16.458,
"time_end": 17.018,
"confidence": 0.79
},
{
"word": " and",
"time_begin": 17.018,
"time_end": 17.698,
"confidence": 0.33
},
{
"word": " the",
"time_begin": 17.698,
"time_end": 17.918,
"confidence": 0.81
},
{
"word": " present",
"time_begin": 17.918,
"time_end": 18.378,
"confidence": 0.79
},
{
"word": " hasn't",
"time_begin": 18.378,
"time_end": 18.918,
"confidence": 0.93
},
{
"word": " been",
"time_begin": 18.918,
"time_end": 19.218,
"confidence": 0.82
},
{
"word": " written",
"time_begin": 19.218,
"time_end": 19.458,
"confidence": 0.86
},
{
"word": " yet.",
"time_begin": 19.458,
"time_end": 19.977999999999998,
"confidence": 0.91
}
],
"language": "en",
"transcription": " The past foreshadows the present and the present hasn't been written yet.",
"time_begin": 14.658000000000001,
"time_end": 19.977999999999998,
"speaker": "not_activated",
"channel": "channel_0",
"emotion": "awe"
}
],
"metadata": {
"total_speech_duration": 17.459999999999997,
"providedFileMetadata": {
"nb channels": 1,
"sample rate": 44100,
"sample width": 16,
"original file type": "audio"
},
"audioConversionTime": 0.8856689929962158,
"nbSilentChannels": -1,
"nbSimilarChannels": 0,
"vadTime": 0.020719051361083984,
"inferenceTime": 2.690950393676758,
"diarizationTime": 0.0000050067901611328125,
"totalTranscriptionTime": 3.5973434448242188,
"emotionTime": 2.5873782634735107
},
"emotion": "awe"
}
}
Txt output
{
"prediction": "Split infinity in a time when less is more, where too much is never enough. There is always hope for the future. The future can be read from the past. The past foreshadows the present and the present hasn't been written yet.",
"prediction_raw": {
"transcription": "Split infinity in a time when less is more, where too much is never enough. There is always hope for the future. The future can be read from the past. The past foreshadows the present and the present hasn't been written yet.",
"metadata": {
"total_speech_duration": 17.459999999999997,
"providedFileMetadata": {
"nb channels": 1,
"sample rate": 44100,
"sample width": 16,
"original file type": "audio"
},
"audioConversionTime": 0.7798864841461182,
"nbSilentChannels": -1,
"nbSimilarChannels": 0,
"vadTime": 0.013231039047241211,
"inferenceTime": 2.7497172355651855,
"diarizationTime": 0.000004291534423828125,
"totalTranscriptionTime": 3.5428390502929688,
"emotionTime": 2.545997142791748
},
"emotion": "awe"
}
}
SRT output
{
"prediction": "1\n00:00:01,170 --> 00:00:07,790\n Split infinity in a time when less is more, where too much is never enough.\n\n2\n00:00:08,610 --> 00:00:14,130\n There is always hope for the future. The future can be read from the past.\n\n3\n00:00:14,650 --> 00:00:19,970\n The past foreshadows the present and the present hasn't been written yet.\n",
"prediction_raw": {
"transcription": "1\n00:00:01,170 --> 00:00:07,790\n Split infinity in a time when less is more, where too much is never enough.\n\n2\n00:00:08,610 --> 00:00:14,130\n There is always hope for the future. The future can be read from the past.\n\n3\n00:00:14,650 --> 00:00:19,970\n The past foreshadows the present and the present hasn't been written yet.\n",
"metadata": {
"total_speech_duration": 17.459999999999997,
"providedFileMetadata": {
"nb channels": 1,
"sample rate": 44100,
"sample width": 16,
"original file type": "audio"
},
"audioConversionTime": 0.7368574142456055,
"nbSilentChannels": -1,
"nbSimilarChannels": 0,
"vadTime": 0.035033226013183594,
"inferenceTime": 2.704582452774048,
"diarizationTime": 0.0000054836273193359375,
"totalTranscriptionTime": 3.4764785766601562,
"emotionTime": 2.5046708583831787
},
"emotion": "awe"
}
}
VTT output
{
"prediction": "WEBVTT\n\n1\n00:00:01.178 --> 00:00:07.798\n Split infinity in a time when less is more, where too much is never enough.\n\n2\n00:00:08.618 --> 00:00:14.137\n There is always hope for the future. The future can be read from the past.\n\n3\n00:00:14.658 --> 00:00:19.977\n The past foreshadows the present and the present hasn't been written yet.\n",
"prediction_raw": {
"transcription": "WEBVTT\n\n1\n00:00:01.178 --> 00:00:07.798\n Split infinity in a time when less is more, where too much is never enough.\n\n2\n00:00:08.618 --> 00:00:14.137\n There is always hope for the future. The future can be read from the past.\n\n3\n00:00:14.658 --> 00:00:19.977\n The past foreshadows the present and the present hasn't been written yet.\n",
"metadata": {
"total_speech_duration": 17.459999999999997,
"providedFileMetadata": {
"nb channels": 1,
"sample rate": 44100,
"sample width": 16,
"original file type": "audio"
},
"audioConversionTime": 0.8466713428497314,
"nbSilentChannels": -1,
"nbSimilarChannels": 0,
"vadTime": 0.036614179611206055,
"inferenceTime": 2.712864398956299,
"diarizationTime": 0.0000045299530029296875,
"totalTranscriptionTime": 3.5961544513702393,
"emotionTime": 2.6435365676879883
},
"emotion": "awe"
}
}