1 research outputs found
Emotion Recognition in Audio and Video Using Deep Neural Networks
Humans are able to comprehend information from multiple domains for e.g.
speech, text and visual. With advancement of deep learning technology there has
been significant improvement of speech recognition. Recognizing emotion from
speech is important aspect and with deep learning technology emotion
recognition has improved in accuracy and latency. There are still many
challenges to improve accuracy. In this work, we attempt to explore different
neural networks to improve accuracy of emotion recognition. With different
architectures explored, we find (CNN+RNN) + 3DCNN multi-model architecture
which processes audio spectrograms and corresponding video frames giving
emotion prediction accuracy of 54.0% among 4 emotions and 71.75% among 3
emotions using IEMOCAP[2] dataset.Comment: 9 pages, 9 figures, 3 table