4,283 research outputs found
Deep Learning for Sentiment Analysis : A Survey
Deep learning has emerged as a powerful machine learning technique that
learns multiple layers of representations or features of the data and produces
state-of-the-art prediction results. Along with the success of deep learning in
many other application domains, deep learning is also popularly used in
sentiment analysis in recent years. This paper first gives an overview of deep
learning and then provides a comprehensive survey of its current applications
in sentiment analysis.Comment: 34 pages, 9 figures, 2 table
Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction
Continuous dimensional emotion prediction is a challenging task where the
fusion of various modalities usually achieves state-of-the-art performance such
as early fusion or late fusion. In this paper, we propose a novel multi-modal
fusion strategy named conditional attention fusion, which can dynamically pay
attention to different modalities at each time step. Long-short term memory
recurrent neural networks (LSTM-RNN) is applied as the basic uni-modality model
to capture long time dependencies. The weights assigned to different modalities
are automatically decided by the current input features and recent history
information rather than being fixed at any kinds of situation. Our experimental
results on a benchmark dataset AVEC2015 show the effectiveness of our method
which outperforms several common fusion strategies for valence prediction.Comment: Appeared at ACM Multimedia 201
Feature Extraction via Recurrent Random Deep Ensembles and its Application in Gruop-level Happiness Estimation
This paper presents a novel ensemble framework to extract highly
discriminative feature representation of image and its application for
group-level happpiness intensity prediction in wild. In order to generate
enough diversity of decisions, n convolutional neural networks are trained by
bootstrapping the training set and extract n features for each image from them.
A recurrent neural network (RNN) is then used to remember which network
extracts better feature and generate the final feature representation for one
individual image. Several group emotion models (GEM) are used to aggregate face
fea- tures in a group and use parameter-optimized support vector regressor
(SVR) to get the final results. Through extensive experiments, the great
effectiveness of the proposed recurrent random deep ensembles (RRDE) is
demonstrated in both structural and decisional ways. The best result yields a
0.55 root-mean-square error (RMSE) on validation set of HAPPEI dataset,
significantly better than the baseline of 0.78
Efficient Low-rank Multimodal Fusion with Modality-Specific Factors
Multimodal research is an emerging field of artificial intelligence, and one
of the main research problems in this field is multimodal fusion. The fusion of
multimodal data is the process of integrating multiple unimodal representations
into one compact multimodal representation. Previous research in this field has
exploited the expressiveness of tensors for multimodal representation. However,
these methods often suffer from exponential increase in dimensions and in
computational complexity introduced by transformation of input into tensor. In
this paper, we propose the Low-rank Multimodal Fusion method, which performs
multimodal fusion using low-rank tensors to improve efficiency. We evaluate our
model on three different tasks: multimodal sentiment analysis, speaker trait
analysis, and emotion recognition. Our model achieves competitive results on
all these tasks while drastically reducing computational complexity. Additional
experiments also show that our model can perform robustly for a wide range of
low-rank settings, and is indeed much more efficient in both training and
inference compared to other methods that utilize tensor representations.Comment: * Equal contribution. 10 pages. Accepted by ACL 201
Tensor Fusion Network for Multimodal Sentiment Analysis
Multimodal sentiment analysis is an increasingly popular research area, which
extends the conventional language-based definition of sentiment analysis to a
multimodal setup where other relevant modalities accompany language. In this
paper, we pose the problem of multimodal sentiment analysis as modeling
intra-modality and inter-modality dynamics. We introduce a novel model, termed
Tensor Fusion Network, which learns both such dynamics end-to-end. The proposed
approach is tailored for the volatile nature of spoken language in online
videos as well as accompanying gestures and voice. In the experiments, our
model outperforms state-of-the-art approaches for both multimodal and unimodal
sentiment analysis.Comment: Accepted as full paper in EMNLP 201
How Deep Neural Networks Can Improve Emotion Recognition on Video Data
We consider the task of dimensional emotion recognition on video data using
deep learning. While several previous methods have shown the benefits of
training temporal neural network models such as recurrent neural networks
(RNNs) on hand-crafted features, few works have considered combining
convolutional neural networks (CNNs) with RNNs. In this work, we present a
system that performs emotion recognition on video data using both CNNs and
RNNs, and we also analyze how much each neural network component contributes to
the system's overall performance. We present our findings on videos from the
Audio/Visual+Emotion Challenge (AV+EC2015). In our experiments, we analyze the
effects of several hyperparameters on overall performance while also achieving
superior performance to the baseline and other competing methods.Comment: Accepted at ICIP 2016. Fixed typo in Experiments sectio
End2You -- The Imperial Toolkit for Multimodal Profiling by End-to-End Learning
We introduce End2You -- the Imperial College London toolkit for multimodal
profiling by end-to-end deep learning. End2You is an open-source toolkit
implemented in Python and is based on Tensorflow. It provides capabilities to
train and evaluate models in an end-to-end manner, i.e., using raw input. It
supports input from raw audio, visual, physiological or other types of
information or combination of those, and the output can be of an arbitrary
representation, for either classification or regression tasks. To our
knowledge, this is the first toolkit that provides generic end-to-end learning
for profiling capabilities in either unimodal or multimodal cases. To test our
toolkit, we utilise the RECOLA database as was used in the AVEC 2016 challenge.
Experimental results indicate that End2You can provide comparable results to
state-of-the-art methods despite no need of expert-alike feature
representations, but self-learning these from the data "end to end"
A Deep Neural Model Of Emotion Appraisal
Emotional concepts play a huge role in our daily life since they take part
into many cognitive processes: from the perception of the environment around us
to different learning processes and natural communication. Social robots need
to communicate with humans, which increased also the popularity of affective
embodied models that adopt different emotional concepts in many everyday tasks.
However, there is still a gap between the development of these solutions and
the integration and development of a complex emotion appraisal system, which is
much necessary for true social robots. In this paper, we propose a deep neural
model which is designed in the light of different aspects of developmental
learning of emotional concepts to provide an integrated solution for internal
and external emotion appraisal. We evaluate the performance of the proposed
model with different challenging corpora and compare it with state-of-the-art
models for external emotion appraisal. To extend the evaluation of the proposed
model, we designed and collected a novel dataset based on a Human-Robot
Interaction (HRI) scenario. We deployed the model in an iCub robot and
evaluated the capability of the robot to learn and describe the affective
behavior of different persons based on observation. The performed experiments
demonstrate that the proposed model is competitive with the state of the art in
describing emotion behavior in general. In addition, it is able to generate
internal emotional concepts that evolve through time: it continuously forms and
updates the formed emotional concepts, which is a step towards creating an
emotional appraisal model grounded in the robot experiences
Continuous Multimodal Emotion Recognition Approach for AVEC 2017
This paper reports the analysis of audio and visual features in predicting
the continuous emotion dimensions under the seventh Audio/Visual Emotion
Challenge (AVEC 2017), which was done as part of a B.Tech. 2nd year internship
project. For visual features we used the HOG (Histogram of Gradients) features,
Fisher encodings of SIFT (Scale-Invariant Feature Transform) features based on
Gaussian mixture model (GMM) and some pretrained Convolutional Neural Network
layers as features; all these extracted for each video clip. For audio features
we used the Bag-of-audio-words (BoAW) representation of the LLDs (low-level
descriptors) generated by openXBOW provided by the organisers of the event.
Then we trained fully connected neural network regression model on the dataset
for all these different modalities. We applied multimodal fusion on the output
models to get the Concordance correlation coefficient on Development set as
well as Test set.Comment: 4 pages, 3 figures, arXiv:1605.06778, arXiv:1512.0338
Multi-Modal Emotion recognition on IEMOCAP Dataset using Deep Learning
Emotion recognition has become an important field of research in Human
Computer Interactions as we improve upon the techniques for modelling the
various aspects of behaviour. With the advancement of technology our
understanding of emotions are advancing, there is a growing need for automatic
emotion recognition systems. One of the directions the research is heading is
the use of Neural Networks which are adept at estimating complex functions that
depend on a large number and diverse source of input data. In this paper we
attempt to exploit this effectiveness of Neural networks to enable us to
perform multimodal Emotion recognition on IEMOCAP dataset using data from
Speech, Text, and Motion capture data from face expressions, rotation and hand
movements. Prior research has concentrated on Emotion detection from Speech on
the IEMOCAP dataset, but our approach is the first that uses the multiple modes
of data offered by IEMOCAP for a more robust and accurate emotion detection
- …