Search CORE

27,136 research outputs found

EmoNets: Multimodal deep learning approaches for emotion recognition in video

Author: Bengio Yoshua
Boulanger-Lewandowski Nicolas
Bouthillier Xavier
Courville Aaron
Dauphin Yann
Ferrari Raul Chandias
Froumenty Pierre
Gulcehre Caglar
Jean Sébastien
Kahou Samira Ebrahimi
Konda Kishore
Lamblin Pascal
Memisevic Roland
Michalski Vincent
Mirza Mehdi
Pal Christopher
Vincent Pascal
Warde-Farley David
Publication venue
Publication date: 29/03/2015
Field of study

The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which consider combinations of features from multiple modalities for label assignment. In this paper we present our approach to learning several specialist models using deep learning techniques, each focusing on one modality. Among these are a convolutional neural network, focusing on capturing visual information in detected faces, a deep belief net focusing on the representation of the audio stream, a K-Means based "bag-of-mouths" model, which extracts visual features around the mouth region and a relational autoencoder, which addresses spatio-temporal aspects of videos. We explore multiple methods for the combination of cues from these modalities into one common classifier. This achieves a considerably greater accuracy than predictions from our strongest single-modality classifier. Our method was the winning submission in the 2013 EmotiW challenge and achieved a test set accuracy of 47.67% on the 2014 dataset

arXiv.org e-Print Archive

PolyPublie

Visual and Lingual Emotion Recognition using Deep Learning Techniques

Author: Kajale Akshay
Publication venue: SJSU ScholarWorks
Publication date: 24/05/2021
Field of study

Emotion recognition has been an integral part of many applications like video games, cognitive computing, and human computer interaction. Emotion can be recognized by many sources including speech, facial expressions, hand gestures and textual attributes. We have developed a prototype emotion recognition system using computer vision and natural language processing techniques. Our goal hybrid system uses mobile camera frames and features abstracted from speech named Mel Frequency Cepstral Coefficient (MFCC) to recognize the emotion of a person. To acknowledge the emotions based on facial expressions, we have developed a Convolutional Neural Network (CNN) model, which has an accuracy of 68%. To recognize emotions based on Speech MFCCs, we have developed a sequential model with an accuracy of 69%. Out Android application can access the front and back camera simultaneously. This allows our application to predict the emotion of the overall conversation happening between the people facing both cameras. The application is also able to record the audio conversation between those people. The two emotions predicted (Face and Speech) are merged into one single emotion using the Fusion Algorithm. Our models are converted to TensorFlow-lite models to reduce the model size and support the limited processing power of mobile. Our system classifies emotions into seven classes: neutral, surprise, happy, fear, sad, disgust, and angr

SJSU ScholarWorks

Порівняння якості методів розпізнавання емоцій з відеопотоку

Author: Матвіїв Катерина Юріївна
Publication venue: Київ
Publication date: 01/01/2018
Field of study

Магістерська дисертація: 100 с., 40 рис., 23 табл., 3 додатки, 67 джерел. Тема: «Порівняння якості методів розпізнавання емоцій з відеопотоку» Об’єктом дослідження даної роботи є емоції з відеопотоку. Предмет досідження – якість методів розпізнавання емоцій. Мета роботи – побудова моделей розпізнавання емоцій людини на основі відеопотоку та порівняння їх якості. Теоретичною та методологічною основою дослідження є зарубіжні роботи у галузі машинного навчання. В роботі розглянуто задачу розпізнавання емоцій, її актуальність. Здійснено побудову моделей розпізнавання емоцій обличчя людини на основі згорткових нейронних мереж та нейронних мереж на основі ключових точок обличчя. Також було визначено якість та швидкість побудованих моделей та здійснено їх порівняння. Таким чином було отримано моделі високої якості точність, яких становить 0.7 – 0.9. Побудовані моделі можуть бути використані у маркетингових, рекламних та соціальних аспектах життя людини. У даній роботі пропонується використання даних моделей для побудови продукту для роботи з аудиторією та оцінено перспективи реалізації такого продукту.The master thesis: 100 p., 40 p., 23 tabl., 3 appendixes , 67 references. Work name: «Emotions Recognition from Video Stream Methods Quality Comparison» Object of the work is human emotions. Subject of work is emotion recognition methods quality. The purpose of work is to implement models for human emotion recognition based on video flow and their quality comparison. Theoretical and methodological basis of study are works of foreign scholars in the field of machine learning. The work discovers the emotion recognition problem, its relevance. The work involves building models for this purpose using neural network with different architectures and comparing their quality. There are different models based on neural network with face landmarks or based on convolutional neural network. Moreover, quality and working time have beed defined in this work. Consequently, we have got high quality models with 0.7 – 0.9 precision value. These models can be used in market, advertisement and social purposes. This work suggests using these model for building start - up project devoted to designing a product for emotion recognition of audience

Electronic Archive of Kyiv Polytechnic Institute

Video Synthesis from the StyleGAN Latent Space

Author: Zhang Lei
Publication venue: SJSU ScholarWorks
Publication date: 20/05/2020
Field of study

Generative models have shown impressive results in generating synthetic images. However, video synthesis is still difficult to achieve, even for these generative models. The best videos that generative models can currently create are a few seconds long, distorted, and low resolution. For this project, I propose and implement a model to synthesize videos at 1024x1024x32 resolution that include human facial expressions by using static images generated from a Generative Adversarial Network trained on the human facial images. To the best of my knowledge, this is the first work that generates realistic videos that are larger than 256x256 resolution from single starting images. This model improves the video synthesis in both quantitative and qualitative ways compared to two state-of-the-art models: TGAN and MocoGAN. In a quantitative comparison, this project reaches a best Average Content Distance (ACD) score of 0.167, as compared to 0.305 and 0.201 of TGAN and MocoGAN, respectively

SJSU ScholarWorks