43,704 research outputs found
Group-level Emotion Recognition using Transfer Learning from Face Identification
In this paper, we describe our algorithmic approach, which was used for
submissions in the fifth Emotion Recognition in the Wild (EmotiW 2017)
group-level emotion recognition sub-challenge. We extracted feature vectors of
detected faces using the Convolutional Neural Network trained for face
identification task, rather than traditional pre-training on emotion
recognition problems. In the final pipeline an ensemble of Random Forest
classifiers was learned to predict emotion score using available training set.
In case when the faces have not been detected, one member of our ensemble
extracts features from the whole image. During our experimental study, the
proposed approach showed the lowest error rate when compared to other explored
techniques. In particular, we achieved 75.4% accuracy on the validation data,
which is 20% higher than the handcrafted feature-based baseline. The source
code using Keras framework is publicly available.Comment: 5 pages, 3 figures, accepted for publication at ICMI17 (EmotiW Grand
Challenge
Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition
Automatic emotion recognition is a challenging task. In this paper, we
present our effort for the audio-video based sub-challenge of the Emotion
Recognition in the Wild (EmotiW) 2018 challenge, which requires participants to
assign a single emotion label to the video clip from the six universal emotions
(Anger, Disgust, Fear, Happiness, Sad and Surprise) and Neutral. The proposed
multimodal emotion recognition system takes audio, video and text information
into account. Except for handcraft features, we also extract bottleneck
features from deep neutral networks (DNNs) via transfer learning. Both temporal
classifiers and non-temporal classifiers are evaluated to obtain the best
unimodal emotion classification result. Then possibilities are extracted and
passed into the Beam Search Fusion (BS-Fusion). We test our method in the
EmotiW 2018 challenge and we gain promising results. Compared with the baseline
system, there is a significant improvement. We achieve 60.34% accuracy on the
testing dataset, which is only 1.5% lower than the winner. It shows that our
method is very competitive.Comment: 9 pages, 11 figures and 4 Tables. EmotiW2018 challeng
Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization
Emotion is a key element in user-generated videos. However, it is difficult
to understand emotions conveyed in such videos due to the complex and
unstructured nature of user-generated content and the sparsity of video frames
expressing emotion. In this paper, for the first time, we study the problem of
transferring knowledge from heterogeneous external sources, including image and
textual data, to facilitate three related tasks in understanding video emotion:
emotion recognition, emotion attribution and emotion-oriented summarization.
Specifically, our framework (1) learns a video encoding from an auxiliary
emotional image dataset in order to improve supervised video emotion
recognition, and (2) transfers knowledge from an auxiliary textual corpora for
zero-shot recognition of emotion classes unseen during training. The proposed
technique for knowledge transfer facilitates novel applications of emotion
attribution and emotion-oriented summarization. A comprehensive set of
experiments on multiple datasets demonstrate the effectiveness of our
framework.Comment: 13 pages, 11 figures. Published at the IEEE Transactions on Affective
Computin
PortraitGAN for Flexible Portrait Manipulation
Previous methods have dealt with discrete manipulation of facial attributes
such as smile, sad, angry, surprise etc, out of canonical expressions and they
are not scalable, operating in single modality. In this paper, we propose a
novel framework that supports continuous edits and multi-modality portrait
manipulation using adversarial learning. Specifically, we adapt
cycle-consistency into the conditional setting by leveraging additional facial
landmarks information. This has two effects: first cycle mapping induces
bidirectional manipulation and identity preserving; second pairing samples from
different modalities can thus be utilized. To ensure high-quality synthesis, we
adopt texture-loss that enforces texture consistency and multi-level
adversarial supervision that facilitates gradient flow. Quantitative and
qualitative experiments show the effectiveness of our framework in performing
flexible and multi-modality portrait manipulation with photo-realistic effects
Towards Learning a Universal Non-Semantic Representation of Speech
The ultimate goal of transfer learning is to reduce labeled data requirements
by exploiting a pre-existing embedding model trained for different datasets or
tasks. The visual and language communities have established benchmarks to
compare embeddings, but the speech community has yet to do so. This paper
proposes a benchmark for comparing speech representations on non-semantic
tasks, and proposes a representation based on an unsupervised triplet-loss
objective. The proposed representation outperforms other representations on the
benchmark, and even exceeds state-of-the-art performance on a number of
transfer learning tasks. The embedding is trained on a publicly available
dataset, and it is tested on a variety of low-resource downstream tasks,
including personalization tasks and medical domain. The benchmark, models, and
evaluation code are publicly released
A Survey of the Trends in Facial and Expression Recognition Databases and Methods
Automated facial identification and facial expression recognition have been
topics of active research over the past few decades. Facial and expression
recognition find applications in human-computer interfaces, subject tracking,
real-time security surveillance systems and social networking. Several holistic
and geometric methods have been developed to identify faces and expressions
using public and local facial image databases. In this work we present the
evolution in facial image data sets and the methodologies for facial
identification and recognition of expressions such as anger, sadness,
happiness, disgust, fear and surprise. We observe that most of the earlier
methods for facial and expression recognition aimed at improving the
recognition rates for facial feature-based methods using static images.
However, the recent methodologies have shifted focus towards robust
implementation of facial/expression recognition from large image databases that
vary with space (gathered from the internet) and time (video recordings). The
evolution trends in databases and methodologies for facial and expression
recognition can be useful for assessing the next-generation topics that may
have applications in security systems or personal identification systems that
involve "Quantitative face" assessments.Comment: 16 pages, 4 figures, 3 tables, International Journal of Computer
Science and Engineering Survey, October, 201
Learnable PINs: Cross-Modal Embeddings for Person Identity
We propose and investigate an identity sensitive joint embedding of face and
voice. Such an embedding enables cross-modal retrieval from voice to face and
from face to voice. We make the following four contributions: first, we show
that the embedding can be learnt from videos of talking faces, without
requiring any identity labels, using a form of cross-modal self-supervision;
second, we develop a curriculum learning schedule for hard negative mining
targeted to this task, that is essential for learning to proceed successfully;
third, we demonstrate and evaluate cross-modal retrieval for identities unseen
and unheard during training over a number of scenarios and establish a
benchmark for this novel task; finally, we show an application of using the
joint embedding for automatically retrieving and labelling characters in TV
dramas.Comment: To appear in ECCV 201
Real-time Facial Expression Recognition "In The Wild'' by Disentangling 3D Expression from Identity
Human emotions analysis has been the focus of many studies, especially in the
field of Affective Computing, and is important for many applications, e.g.
human-computer intelligent interaction, stress analysis, interactive games,
animations, etc. Solutions for automatic emotion analysis have also benefited
from the development of deep learning approaches and the availability of vast
amount of visual facial data on the internet. This paper proposes a novel
method for human emotion recognition from a single RGB image. We construct a
large-scale dataset of facial videos (\textbf{FaceVid}), rich in facial
dynamics, identities, expressions, appearance and 3D pose variations. We use
this dataset to train a deep Convolutional Neural Network for estimating
expression parameters of a 3D Morphable Model and combine it with an effective
back-end emotion classifier. Our proposed framework runs at 50 frames per
second and is capable of robustly estimating parameters of 3D expression
variation and accurately recognizing facial expressions from in-the-wild
images. We present extensive experimental evaluation that shows that the
proposed method outperforms the compared techniques in estimating the 3D
expression parameters and achieves state-of-the-art performance in recognising
the basic emotions from facial images, as well as recognising stress from
facial videos. %compared to the current state of the art in emotion recognition
from facial images.Comment: to be published in 15th IEEE International Conference on Automatic
Face and Gesture Recognition (FG 2020
Deep Facial Expression Recognition: A Survey
With the transition of facial expression recognition (FER) from
laboratory-controlled to challenging in-the-wild conditions and the recent
success of deep learning techniques in various fields, deep neural networks
have increasingly been leveraged to learn discriminative representations for
automatic FER. Recent deep FER systems generally focus on two important issues:
overfitting caused by a lack of sufficient training data and
expression-unrelated variations, such as illumination, head pose and identity
bias. In this paper, we provide a comprehensive survey on deep FER, including
datasets and algorithms that provide insights into these intrinsic problems.
First, we describe the standard pipeline of a deep FER system with the related
background knowledge and suggestions of applicable implementations for each
stage. We then introduce the available datasets that are widely used in the
literature and provide accepted data selection and evaluation principles for
these datasets. For the state of the art in deep FER, we review existing novel
deep neural networks and related training strategies that are designed for FER
based on both static images and dynamic image sequences, and discuss their
advantages and limitations. Competitive performances on widely used benchmarks
are also summarized in this section. We then extend our survey to additional
related issues and application scenarios. Finally, we review the remaining
challenges and corresponding opportunities in this field as well as future
directions for the design of robust deep FER systems
Beneath the Tip of the Iceberg: Current Challenges and New Directions in Sentiment Analysis Research
Sentiment analysis as a field has come a long way since it was first
introduced as a task nearly 20 years ago. It has widespread commercial
applications in various domains like marketing, risk management, market
research, and politics, to name a few. Given its saturation in specific
subtasks -- such as sentiment polarity classification -- and datasets, there is
an underlying perception that this field has reached its maturity. In this
article, we discuss this perception by pointing out the shortcomings and
under-explored, yet key aspects of this field that are necessary to attain true
sentiment understanding. We analyze the significant leaps responsible for its
current relevance. Further, we attempt to chart a possible course for this
field that covers many overlooked and unanswered questions.Comment: Published in the IEEE Transactions on Affective Computing (TAFFC
- …