Search CORE

4,283 research outputs found

Deep Learning for Sentiment Analysis : A Survey

Author: Liu Bing
Wang Shuai
Zhang Lei
Publication venue
Publication date: 30/01/2018
Field of study

Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.Comment: 34 pages, 9 figures, 2 table

arXiv.org e-Print Archive

Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction

Author: Chen Shizhe
Jin Qin
Publication venue
Publication date: 04/09/2017
Field of study

Continuous dimensional emotion prediction is a challenging task where the fusion of various modalities usually achieves state-of-the-art performance such as early fusion or late fusion. In this paper, we propose a novel multi-modal fusion strategy named conditional attention fusion, which can dynamically pay attention to different modalities at each time step. Long-short term memory recurrent neural networks (LSTM-RNN) is applied as the basic uni-modality model to capture long time dependencies. The weights assigned to different modalities are automatically decided by the current input features and recent history information rather than being fixed at any kinds of situation. Our experimental results on a benchmark dataset AVEC2015 show the effectiveness of our method which outperforms several common fusion strategies for valence prediction.Comment: Appeared at ACM Multimedia 201

arXiv.org e-Print Archive

Feature Extraction via Recurrent Random Deep Ensembles and its Application in Gruop-level Happiness Estimation

Author: Pan Yichen
Tang Shitao
Publication venue
Publication date: 24/07/2017
Field of study

This paper presents a novel ensemble framework to extract highly discriminative feature representation of image and its application for group-level happpiness intensity prediction in wild. In order to generate enough diversity of decisions, n convolutional neural networks are trained by bootstrapping the training set and extract n features for each image from them. A recurrent neural network (RNN) is then used to remember which network extracts better feature and generate the final feature representation for one individual image. Several group emotion models (GEM) are used to aggregate face fea- tures in a group and use parameter-optimized support vector regressor (SVR) to get the final results. Through extensive experiments, the great effectiveness of the proposed recurrent random deep ensembles (RRDE) is demonstrated in both structural and decisional ways. The best result yields a 0.55 root-mean-square error (RMSE) on validation set of HAPPEI dataset, significantly better than the baseline of 0.78

arXiv.org e-Print Archive

Efficient Low-rank Multimodal Fusion with Modality-Specific Factors

Author: Lakshminarasimhan Varun Bharadhwaj
Liang Paul Pu
Liu Zhun
Morency Louis-Philippe
Shen Ying
Zadeh Amir
Publication venue
Publication date: 31/05/2018
Field of study

Multimodal research is an emerging field of artificial intelligence, and one of the main research problems in this field is multimodal fusion. The fusion of multimodal data is the process of integrating multiple unimodal representations into one compact multimodal representation. Previous research in this field has exploited the expressiveness of tensors for multimodal representation. However, these methods often suffer from exponential increase in dimensions and in computational complexity introduced by transformation of input into tensor. In this paper, we propose the Low-rank Multimodal Fusion method, which performs multimodal fusion using low-rank tensors to improve efficiency. We evaluate our model on three different tasks: multimodal sentiment analysis, speaker trait analysis, and emotion recognition. Our model achieves competitive results on all these tasks while drastically reducing computational complexity. Additional experiments also show that our model can perform robustly for a wide range of low-rank settings, and is indeed much more efficient in both training and inference compared to other methods that utilize tensor representations.Comment: * Equal contribution. 10 pages. Accepted by ACL 201

arXiv.org e-Print Archive

Tensor Fusion Network for Multimodal Sentiment Analysis

Author: Cambria Erik
Chen Minghai
Morency Louis-Philippe
Poria Soujanya
Zadeh Amir
Publication venue
Publication date: 23/07/2017
Field of study

Multimodal sentiment analysis is an increasingly popular research area, which extends the conventional language-based definition of sentiment analysis to a multimodal setup where other relevant modalities accompany language. In this paper, we pose the problem of multimodal sentiment analysis as modeling intra-modality and inter-modality dynamics. We introduce a novel model, termed Tensor Fusion Network, which learns both such dynamics end-to-end. The proposed approach is tailored for the volatile nature of spoken language in online videos as well as accompanying gestures and voice. In the experiments, our model outperforms state-of-the-art approaches for both multimodal and unimodal sentiment analysis.Comment: Accepted as full paper in EMNLP 201

arXiv.org e-Print Archive

How Deep Neural Networks Can Improve Emotion Recognition on Video Data

Author: Brady Kevin
Dagli Charlie
Huang Thomas S.
Khorrami Pooya
Paine Tom Le
Publication venue
Publication date: 09/01/2017
Field of study

We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this work, we present a system that performs emotion recognition on video data using both CNNs and RNNs, and we also analyze how much each neural network component contributes to the system's overall performance. We present our findings on videos from the Audio/Visual+Emotion Challenge (AV+EC2015). In our experiments, we analyze the effects of several hyperparameters on overall performance while also achieving superior performance to the baseline and other competing methods.Comment: Accepted at ICIP 2016. Fixed typo in Experiments sectio

arXiv.org e-Print Archive

End2You -- The Imperial Toolkit for Multimodal Profiling by End-to-End Learning

Author: Schuller Bjorn W.
Tzirakis Panagiotis
Zafeiriou Stefanos
Publication venue
Publication date: 04/02/2018
Field of study

We introduce End2You -- the Imperial College London toolkit for multimodal profiling by end-to-end deep learning. End2You is an open-source toolkit implemented in Python and is based on Tensorflow. It provides capabilities to train and evaluate models in an end-to-end manner, i.e., using raw input. It supports input from raw audio, visual, physiological or other types of information or combination of those, and the output can be of an arbitrary representation, for either classification or regression tasks. To our knowledge, this is the first toolkit that provides generic end-to-end learning for profiling capabilities in either unimodal or multimodal cases. To test our toolkit, we utilise the RECOLA database as was used in the AVEC 2016 challenge. Experimental results indicate that End2You can provide comparable results to state-of-the-art methods despite no need of expert-alike feature representations, but self-learning these from the data "end to end"

arXiv.org e-Print Archive

A Deep Neural Model Of Emotion Appraisal

Author: Barakova Emilia
Barros Pablo
Wermter Stefan
Publication venue
Publication date: 01/08/2018
Field of study

Emotional concepts play a huge role in our daily life since they take part into many cognitive processes: from the perception of the environment around us to different learning processes and natural communication. Social robots need to communicate with humans, which increased also the popularity of affective embodied models that adopt different emotional concepts in many everyday tasks. However, there is still a gap between the development of these solutions and the integration and development of a complex emotion appraisal system, which is much necessary for true social robots. In this paper, we propose a deep neural model which is designed in the light of different aspects of developmental learning of emotional concepts to provide an integrated solution for internal and external emotion appraisal. We evaluate the performance of the proposed model with different challenging corpora and compare it with state-of-the-art models for external emotion appraisal. To extend the evaluation of the proposed model, we designed and collected a novel dataset based on a Human-Robot Interaction (HRI) scenario. We deployed the model in an iCub robot and evaluated the capability of the robot to learn and describe the affective behavior of different persons based on observation. The performed experiments demonstrate that the proposed model is competitive with the state of the art in describing emotion behavior in general. In addition, it is able to generate internal emotional concepts that evolve through time: it continuously forms and updates the formed emotional concepts, which is a step towards creating an emotional appraisal model grounded in the robot experiences

arXiv.org e-Print Archive

Continuous Multimodal Emotion Recognition Approach for AVEC 2017

Author: Dhall Abhinav
Singh Narotam
Singh Nittin
Publication venue
Publication date: 24/10/2017
Field of study

This paper reports the analysis of audio and visual features in predicting the continuous emotion dimensions under the seventh Audio/Visual Emotion Challenge (AVEC 2017), which was done as part of a B.Tech. 2nd year internship project. For visual features we used the HOG (Histogram of Gradients) features, Fisher encodings of SIFT (Scale-Invariant Feature Transform) features based on Gaussian mixture model (GMM) and some pretrained Convolutional Neural Network layers as features; all these extracted for each video clip. For audio features we used the Bag-of-audio-words (BoAW) representation of the LLDs (low-level descriptors) generated by openXBOW provided by the organisers of the event. Then we trained fully connected neural network regression model on the dataset for all these different modalities. We applied multimodal fusion on the output models to get the Concordance correlation coefficient on Development set as well as Test set.Comment: 4 pages, 3 figures, arXiv:1605.06778, arXiv:1512.0338

arXiv.org e-Print Archive

Multi-Modal Emotion recognition on IEMOCAP Dataset using Deep Learning

Author: Beigi Homayoon
Tripathi Samarth
Tripathi Sarthak
Publication venue
Publication date: 06/11/2019
Field of study

Emotion recognition has become an important field of research in Human Computer Interactions as we improve upon the techniques for modelling the various aspects of behaviour. With the advancement of technology our understanding of emotions are advancing, there is a growing need for automatic emotion recognition systems. One of the directions the research is heading is the use of Neural Networks which are adept at estimating complex functions that depend on a large number and diverse source of input data. In this paper we attempt to exploit this effectiveness of Neural networks to enable us to perform multimodal Emotion recognition on IEMOCAP dataset using data from Speech, Text, and Motion capture data from face expressions, rotation and hand movements. Prior research has concentrated on Emotion detection from Speech on the IEMOCAP dataset, but our approach is the first that uses the multiple modes of data offered by IEMOCAP for a more robust and accurate emotion detection

arXiv.org e-Print Archive