What is Emotion Recognition

The classification of emotions has been a topic of interest for psychologists for decades, and this research has provided valuable insights into defining distinct categories of emotion bridged by continuous gradients. Our emotion recognition system is built upon research from New York University, and aims to accurately identify the emotional state of individuals across 99 languages. This documentation explains how to integrate our system into your applications.

DescriptionLabel
Admirationadmiration
Adorationadoration
Aesthetic appreciationaesthetic_appreciation
Amusementamusement
Angeranger
Anxietyanxiesty
Aweawe
Awkwardnessawkwardness
Boredomborebdom
Calmnessclamness
Confusionconfusion
Cravingcraving
Disgustdisgust
Empathic painempathic_pain
Entrancemententrancement
Excitementexcitement
Fearfear
Horrorhorror
Happinesshappiness
Interestinterest
Joyjoy
Nostalgianostalgia
Reliefrelief
Romanceromance
Sadnesssadness
Satisfactionsatisfaction
Sexual desiresexual_desire
Surprisesurprise

Activate the Emotion Recognition

In the app

In the app Toggle the emotion recognition using the checkbox

Using the API

curl -X 'POST'  
    '<https://api.gladia.io/audio/text/audio-transcription/>'  
    -H 'accept: application/json'  
    -H 'x-gladia-key: XXXXXXXXXXXXXXX'  
    -H 'Content-Type: multipart/form-data'  
    -F "audio_url=<http://files.gladia.io/example/audio-transcription/split_infinity.wav>"  
    -F "output_format=json"  
    -F "toggle_text_emotion_recognition=true"
Output FormatExpected behaviorExample
json (default)transcript is in "prediction"
general emotion recognition based on the full transcription is in "prediction_raw.emotion"
utterance based emotion recognition is in "prediction.utterance.emotion" and in "prediction_raw.utterance.emotion"
{
"prediction": [
{
"words": [{...}],
"language": "en",
"transcription": " There is always hope for the future. The future can be read from the past.",
"time_begin": 8.618,
"time_end": 14.138,
"speaker": "not_activated",
"channel": "channel_0",
"emotion": "calmness"
},
],
"prediction_raw": {
"metadata": {...},
"transcription": [
{
"words": [{...}],
"language": "en",
"transcription": " There is always hope for the future. The future can be read from the past.",
"time_begin": 8.618,
"time_end": 14.138,
"speaker": "not_activated",
"channel": "channel_0",
"emotion": "calmness"
},
],
"emotion": "awe"
}
}
plainOnly transcript is returned in plain text, no emotion detection is performed"There is always hope for the future. The future can be read from the past."
txttranscript is in "prediction"
general emotion recognition based on the full transcription is in "prediction_raw.emotion"
utterance based emotion recognition is in "prediction_raw.utterance.emotion"
{
"prediction": "Split infinity in a time when less is more, where too much is never enough. There is always hope for the future. The future can be read from the past. The past foreshadows the present and the present hasn't been written yet.",
"prediction_raw":
{
"transcription": [
{
"words": [{...}],
"language": "en",
"transcription": " There is always hope for the future. The future can be read from the past.",
"time_begin": 8.618,
"time_end": 14.138,
"speaker": "not_activated",
"channel": "channel_0",
"emotion": "calmness"
},
]
"metadata": {...},
"emotion":"awe"
}
}
vtttranscript in VTT format is in "prediction"
transcript in original format is in "prediction_raw.transcription"
general emotion recognition based on the full transcription is in "prediction_raw.emotion"
utterance based emotion recognition is in "prediction_raw.utterance.emotion"
{
"prediction": "VTT Transcript",
"prediction_raw":
{
"transcription": [
{
"words": [{...}],
"language": "en",
"transcription": " There is always hope for the future. The future can be read from the past.",
"time_begin": 8.618,
"time_end": 14.138,
"speaker": "not_activated",
"channel": "channel_0",
"emotion": "calmness"
},
]
"metadata": {...},
"emotion":"awe"
}
}
srttranscript in SRT format is in "prediction"
transcript in original format is in "prediction_raw.transcription"
general emotion recognition based on the full transcription is in "prediction_raw.emotion"
utterance based emotion recognition is in "prediction_raw.utterance.emotion"
{
"prediction": "VTT Transcript",
"prediction_raw":
{
"transcription": [
{
"words": [{...}],
"language": "en",
"transcription": " There is always hope for the future. The future can be read from the past.",
"time_begin": 8.618,
"time_end": 14.138,
"speaker": "not_activated",
"channel": "channel_0",
"emotion": "calmness"
},
]
"metadata": {...},
"emotion":"awe"
}
}

Full Examples

Plain output

"Hope remains despite the current minimalist atmosphere, as the past can guide us towards the future. Ultimately, it is up to us to shape the present."

JSON output

{
  "prediction": [
    {
      "words": [
        {
          "word": " Split",
          "time_begin": 1.1780000000000002,
          "time_end": 1.8980000000000001,
          "confidence": 0.49
        },
        {
          "word": " infinity",
          "time_begin": 1.8980000000000001,
          "time_end": 1.538,
          "confidence": 0.72
        },
        {
          "word": " in",
          "time_begin": 1.538,
          "time_end": 2.618,
          "confidence": 0.34
        },
        {
          "word": " a",
          "time_begin": 2.618,
          "time_end": 2.8779999999999997,
          "confidence": 1
        },
        {
          "word": " time",
          "time_begin": 2.8779999999999997,
          "time_end": 3.318,
          "confidence": 0.81
        },
        {
          "word": " when",
          "time_begin": 3.318,
          "time_end": 3.778,
          "confidence": 0.86
        },
        {
          "word": " less",
          "time_begin": 3.778,
          "time_end": 4.058,
          "confidence": 0.87
        },
        {
          "word": " is",
          "time_begin": 4.058,
          "time_end": 4.378,
          "confidence": 0.9
        },
        {
          "word": " more,",
          "time_begin": 4.378,
          "time_end": 4.938,
          "confidence": 0.88
        },
        {
          "word": " where",
          "time_begin": 5.638,
          "time_end": 5.718,
          "confidence": 0.89
        },
        {
          "word": " too",
          "time_begin": 5.718,
          "time_end": 6.138,
          "confidence": 0.8
        },
        {
          "word": " much",
          "time_begin": 6.138,
          "time_end": 6.478,
          "confidence": 0.81
        },
        {
          "word": " is",
          "time_begin": 6.478,
          "time_end": 6.918,
          "confidence": 0.9
        },
        {
          "word": " never",
          "time_begin": 6.918,
          "time_end": 7.258,
          "confidence": 0.88
        },
        {
          "word": " enough.",
          "time_begin": 7.258,
          "time_end": 7.798,
          "confidence": 0.78
        }
      ],
      "language": "en",
      "transcription": " Split infinity in a time when less is more, where too much is never enough.",
      "time_begin": 1.1780000000000002,
      "time_end": 7.798,
      "speaker": "not_activated",
      "channel": "channel_0",
      "emotion": "romance"
    },
    {
      "words": [
        {
          "word": " There",
          "time_begin": 8.618,
          "time_end": 8.678,
          "confidence": 0.8
        },
        {
          "word": " is",
          "time_begin": 8.678,
          "time_end": 8.958,
          "confidence": 0.89
        },
        {
          "word": " always",
          "time_begin": 8.958,
          "time_end": 9.478000000000002,
          "confidence": 0.76
        },
        {
          "word": " hope",
          "time_begin": 9.478000000000002,
          "time_end": 9.778,
          "confidence": 0.83
        },
        {
          "word": " for",
          "time_begin": 9.778,
          "time_end": 10.118,
          "confidence": 0.9
        },
        {
          "word": " the",
          "time_begin": 10.118,
          "time_end": 10.358,
          "confidence": 0.82
        },
        {
          "word": " future.",
          "time_begin": 10.358,
          "time_end": 10.738000000000001,
          "confidence": 0.94
        },
        {
          "word": " The",
          "time_begin": 11.738000000000001,
          "time_end": 11.898000000000001,
          "confidence": 0.81
        },
        {
          "word": " future",
          "time_begin": 11.898000000000001,
          "time_end": 12.218,
          "confidence": 0.94
        },
        {
          "word": " can",
          "time_begin": 12.218,
          "time_end": 12.578000000000001,
          "confidence": 0.9
        },
        {
          "word": " be",
          "time_begin": 12.578000000000001,
          "time_end": 12.838000000000001,
          "confidence": 0.91
        },
        {
          "word": " read",
          "time_begin": 12.838000000000001,
          "time_end": 13.038,
          "confidence": 0.9
        },
        {
          "word": " from",
          "time_begin": 13.038,
          "time_end": 13.338000000000001,
          "confidence": 0.82
        },
        {
          "word": " the",
          "time_begin": 13.338000000000001,
          "time_end": 13.558000000000002,
          "confidence": 0.82
        },
        {
          "word": " past.",
          "time_begin": 13.558000000000002,
          "time_end": 14.138,
          "confidence": 0.81
        }
      ],
      "language": "en",
      "transcription": " There is always hope for the future. The future can be read from the past.",
      "time_begin": 8.618,
      "time_end": 14.138,
      "speaker": "not_activated",
      "channel": "channel_0",
      "emotion": "calmness"
    },
    {
      "words": [
        {
          "word": " The",
          "time_begin": 14.658000000000001,
          "time_end": 14.778,
          "confidence": 0.81
        },
        {
          "word": " past",
          "time_begin": 14.778,
          "time_end": 15.358,
          "confidence": 0.82
        },
        {
          "word": " foreshadows",
          "time_begin": 15.358,
          "time_end": 16.098,
          "confidence": 0.89
        },
        {
          "word": " the",
          "time_begin": 16.098,
          "time_end": 16.458,
          "confidence": 0.81
        },
        {
          "word": " present",
          "time_begin": 16.458,
          "time_end": 17.018,
          "confidence": 0.79
        },
        {
          "word": " and",
          "time_begin": 17.018,
          "time_end": 17.698,
          "confidence": 0.33
        },
        {
          "word": " the",
          "time_begin": 17.698,
          "time_end": 17.918,
          "confidence": 0.81
        },
        {
          "word": " present",
          "time_begin": 17.918,
          "time_end": 18.378,
          "confidence": 0.79
        },
        {
          "word": " hasn't",
          "time_begin": 18.378,
          "time_end": 18.918,
          "confidence": 0.93
        },
        {
          "word": " been",
          "time_begin": 18.918,
          "time_end": 19.218,
          "confidence": 0.82
        },
        {
          "word": " written",
          "time_begin": 19.218,
          "time_end": 19.458,
          "confidence": 0.86
        },
        {
          "word": " yet.",
          "time_begin": 19.458,
          "time_end": 19.977999999999998,
          "confidence": 0.91
        }
      ],
      "language": "en",
      "transcription": " The past foreshadows the present and the present hasn't been written yet.",
      "time_begin": 14.658000000000001,
      "time_end": 19.977999999999998,
      "speaker": "not_activated",
      "channel": "channel_0",
      "emotion": "awe"
    }
  ],
  "prediction_raw": {
    "transcription": [
      {
        "words": [
          {
            "word": " Split",
            "time_begin": 1.1780000000000002,
            "time_end": 1.8980000000000001,
            "confidence": 0.49
          },
          {
            "word": " infinity",
            "time_begin": 1.8980000000000001,
            "time_end": 1.538,
            "confidence": 0.72
          },
          {
            "word": " in",
            "time_begin": 1.538,
            "time_end": 2.618,
            "confidence": 0.34
          },
          {
            "word": " a",
            "time_begin": 2.618,
            "time_end": 2.8779999999999997,
            "confidence": 1
          },
          {
            "word": " time",
            "time_begin": 2.8779999999999997,
            "time_end": 3.318,
            "confidence": 0.81
          },
          {
            "word": " when",
            "time_begin": 3.318,
            "time_end": 3.778,
            "confidence": 0.86
          },
          {
            "word": " less",
            "time_begin": 3.778,
            "time_end": 4.058,
            "confidence": 0.87
          },
          {
            "word": " is",
            "time_begin": 4.058,
            "time_end": 4.378,
            "confidence": 0.9
          },
          {
            "word": " more,",
            "time_begin": 4.378,
            "time_end": 4.938,
            "confidence": 0.88
          },
          {
            "word": " where",
            "time_begin": 5.638,
            "time_end": 5.718,
            "confidence": 0.89
          },
          {
            "word": " too",
            "time_begin": 5.718,
            "time_end": 6.138,
            "confidence": 0.8
          },
          {
            "word": " much",
            "time_begin": 6.138,
            "time_end": 6.478,
            "confidence": 0.81
          },
          {
            "word": " is",
            "time_begin": 6.478,
            "time_end": 6.918,
            "confidence": 0.9
          },
          {
            "word": " never",
            "time_begin": 6.918,
            "time_end": 7.258,
            "confidence": 0.88
          },
          {
            "word": " enough.",
            "time_begin": 7.258,
            "time_end": 7.798,
            "confidence": 0.78
          }
        ],
        "language": "en",
        "transcription": " Split infinity in a time when less is more, where too much is never enough.",
        "time_begin": 1.1780000000000002,
        "time_end": 7.798,
        "speaker": "not_activated",
        "channel": "channel_0",
        "emotion": "romance"
      },
      {
        "words": [
          {
            "word": " There",
            "time_begin": 8.618,
            "time_end": 8.678,
            "confidence": 0.8
          },
          {
            "word": " is",
            "time_begin": 8.678,
            "time_end": 8.958,
            "confidence": 0.89
          },
          {
            "word": " always",
            "time_begin": 8.958,
            "time_end": 9.478000000000002,
            "confidence": 0.76
          },
          {
            "word": " hope",
            "time_begin": 9.478000000000002,
            "time_end": 9.778,
            "confidence": 0.83
          },
          {
            "word": " for",
            "time_begin": 9.778,
            "time_end": 10.118,
            "confidence": 0.9
          },
          {
            "word": " the",
            "time_begin": 10.118,
            "time_end": 10.358,
            "confidence": 0.82
          },
          {
            "word": " future.",
            "time_begin": 10.358,
            "time_end": 10.738000000000001,
            "confidence": 0.94
          },
          {
            "word": " The",
            "time_begin": 11.738000000000001,
            "time_end": 11.898000000000001,
            "confidence": 0.81
          },
          {
            "word": " future",
            "time_begin": 11.898000000000001,
            "time_end": 12.218,
            "confidence": 0.94
          },
          {
            "word": " can",
            "time_begin": 12.218,
            "time_end": 12.578000000000001,
            "confidence": 0.9
          },
          {
            "word": " be",
            "time_begin": 12.578000000000001,
            "time_end": 12.838000000000001,
            "confidence": 0.91
          },
          {
            "word": " read",
            "time_begin": 12.838000000000001,
            "time_end": 13.038,
            "confidence": 0.9
          },
          {
            "word": " from",
            "time_begin": 13.038,
            "time_end": 13.338000000000001,
            "confidence": 0.82
          },
          {
            "word": " the",
            "time_begin": 13.338000000000001,
            "time_end": 13.558000000000002,
            "confidence": 0.82
          },
          {
            "word": " past.",
            "time_begin": 13.558000000000002,
            "time_end": 14.138,
            "confidence": 0.81
          }
        ],
        "language": "en",
        "transcription": " There is always hope for the future. The future can be read from the past.",
        "time_begin": 8.618,
        "time_end": 14.138,
        "speaker": "not_activated",
        "channel": "channel_0",
        "emotion": "calmness"
      },
      {
        "words": [
          {
            "word": " The",
            "time_begin": 14.658000000000001,
            "time_end": 14.778,
            "confidence": 0.81
          },
          {
            "word": " past",
            "time_begin": 14.778,
            "time_end": 15.358,
            "confidence": 0.82
          },
          {
            "word": " foreshadows",
            "time_begin": 15.358,
            "time_end": 16.098,
            "confidence": 0.89
          },
          {
            "word": " the",
            "time_begin": 16.098,
            "time_end": 16.458,
            "confidence": 0.81
          },
          {
            "word": " present",
            "time_begin": 16.458,
            "time_end": 17.018,
            "confidence": 0.79
          },
          {
            "word": " and",
            "time_begin": 17.018,
            "time_end": 17.698,
            "confidence": 0.33
          },
          {
            "word": " the",
            "time_begin": 17.698,
            "time_end": 17.918,
            "confidence": 0.81
          },
          {
            "word": " present",
            "time_begin": 17.918,
            "time_end": 18.378,
            "confidence": 0.79
          },
          {
            "word": " hasn't",
            "time_begin": 18.378,
            "time_end": 18.918,
            "confidence": 0.93
          },
          {
            "word": " been",
            "time_begin": 18.918,
            "time_end": 19.218,
            "confidence": 0.82
          },
          {
            "word": " written",
            "time_begin": 19.218,
            "time_end": 19.458,
            "confidence": 0.86
          },
          {
            "word": " yet.",
            "time_begin": 19.458,
            "time_end": 19.977999999999998,
            "confidence": 0.91
          }
        ],
        "language": "en",
        "transcription": " The past foreshadows the present and the present hasn't been written yet.",
        "time_begin": 14.658000000000001,
        "time_end": 19.977999999999998,
        "speaker": "not_activated",
        "channel": "channel_0",
        "emotion": "awe"
      }
    ],
    "metadata": {
      "total_speech_duration": 17.459999999999997,
      "providedFileMetadata": {
        "nb channels": 1,
        "sample rate": 44100,
        "sample width": 16,
        "original file type": "audio"
      },
      "audioConversionTime": 0.8856689929962158,
      "nbSilentChannels": -1,
      "nbSimilarChannels": 0,
      "vadTime": 0.020719051361083984,
      "inferenceTime": 2.690950393676758,
      "diarizationTime": 0.0000050067901611328125,
      "totalTranscriptionTime": 3.5973434448242188,
      "emotionTime": 2.5873782634735107
    },
    "emotion": "awe"
  }
}

Txt output

{
  "prediction": "Split infinity in a time when less is more, where too much is never enough.  There is always hope for the future. The future can be read from the past.  The past foreshadows the present and the present hasn't been written yet.",
  "prediction_raw": {
    "transcription": "Split infinity in a time when less is more, where too much is never enough.  There is always hope for the future. The future can be read from the past.  The past foreshadows the present and the present hasn't been written yet.",
    "metadata": {
      "total_speech_duration": 17.459999999999997,
      "providedFileMetadata": {
        "nb channels": 1,
        "sample rate": 44100,
        "sample width": 16,
        "original file type": "audio"
      },
      "audioConversionTime": 0.7798864841461182,
      "nbSilentChannels": -1,
      "nbSimilarChannels": 0,
      "vadTime": 0.013231039047241211,
      "inferenceTime": 2.7497172355651855,
      "diarizationTime": 0.000004291534423828125,
      "totalTranscriptionTime": 3.5428390502929688,
      "emotionTime": 2.545997142791748
    },
    "emotion": "awe"
  }
}

SRT output

{
  "prediction": "1\n00:00:01,170 --> 00:00:07,790\n Split infinity in a time when less is more, where too much is never enough.\n\n2\n00:00:08,610 --> 00:00:14,130\n There is always hope for the future. The future can be read from the past.\n\n3\n00:00:14,650 --> 00:00:19,970\n The past foreshadows the present and the present hasn't been written yet.\n",
  "prediction_raw": {
    "transcription": "1\n00:00:01,170 --> 00:00:07,790\n Split infinity in a time when less is more, where too much is never enough.\n\n2\n00:00:08,610 --> 00:00:14,130\n There is always hope for the future. The future can be read from the past.\n\n3\n00:00:14,650 --> 00:00:19,970\n The past foreshadows the present and the present hasn't been written yet.\n",
    "metadata": {
      "total_speech_duration": 17.459999999999997,
      "providedFileMetadata": {
        "nb channels": 1,
        "sample rate": 44100,
        "sample width": 16,
        "original file type": "audio"
      },
      "audioConversionTime": 0.7368574142456055,
      "nbSilentChannels": -1,
      "nbSimilarChannels": 0,
      "vadTime": 0.035033226013183594,
      "inferenceTime": 2.704582452774048,
      "diarizationTime": 0.0000054836273193359375,
      "totalTranscriptionTime": 3.4764785766601562,
      "emotionTime": 2.5046708583831787
    },
    "emotion": "awe"
  }
}

VTT output

{
  "prediction": "WEBVTT\n\n1\n00:00:01.178 --> 00:00:07.798\n Split infinity in a time when less is more, where too much is never enough.\n\n2\n00:00:08.618 --> 00:00:14.137\n There is always hope for the future. The future can be read from the past.\n\n3\n00:00:14.658 --> 00:00:19.977\n The past foreshadows the present and the present hasn't been written yet.\n",
  "prediction_raw": {
    "transcription": "WEBVTT\n\n1\n00:00:01.178 --> 00:00:07.798\n Split infinity in a time when less is more, where too much is never enough.\n\n2\n00:00:08.618 --> 00:00:14.137\n There is always hope for the future. The future can be read from the past.\n\n3\n00:00:14.658 --> 00:00:19.977\n The past foreshadows the present and the present hasn't been written yet.\n",
    "metadata": {
      "total_speech_duration": 17.459999999999997,
      "providedFileMetadata": {
        "nb channels": 1,
        "sample rate": 44100,
        "sample width": 16,
        "original file type": "audio"
      },
      "audioConversionTime": 0.8466713428497314,
      "nbSilentChannels": -1,
      "nbSimilarChannels": 0,
      "vadTime": 0.036614179611206055,
      "inferenceTime": 2.712864398956299,
      "diarizationTime": 0.0000045299530029296875,
      "totalTranscriptionTime": 3.5961544513702393,
      "emotionTime": 2.6435365676879883
    },
    "emotion": "awe"
  }
}