781 research outputs found
Time-delay neural network for continuous emotional dimension prediction from facial expression sequences
"(c) 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works."Automatic continuous affective state prediction from naturalistic facial expression is a very challenging research topic but very important in human-computer interaction. One of the main challenges is modeling the dynamics that characterize naturalistic expressions. In this paper, a novel two-stage automatic system is proposed to continuously predict affective dimension values from facial expression videos. In the first stage, traditional regression methods are used to classify each individual video frame, while in the second stage, a Time-Delay Neural Network (TDNN) is proposed to model the temporal relationships between
consecutive predictions. The two-stage approach separates the emotional state dynamics modeling from an individual emotional state prediction step based on input features. In doing so, the temporal information used by the TDNN is not biased by the high variability between features of consecutive frames and allows the network to more easily exploit the slow changing dynamics between emotional states. The system was fully tested and evaluated on three different facial expression video datasets. Our experimental results demonstrate that the use of a two-stage approach combined with the TDNN to take into account previously classified frames significantly improves the overall performance of continuous emotional state estimation in naturalistic
facial expressions. The proposed approach has won the affect recognition sub-challenge of the third international Audio/Visual Emotion Recognition Challenge (AVEC2013)1
A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews
Despite the recent advances in opinion mining for written reviews, few works
have tackled the problem on other sources of reviews. In light of this issue,
we propose a multi-modal approach for mining fine-grained opinions from video
reviews that is able to determine the aspects of the item under review that are
being discussed and the sentiment orientation towards them. Our approach works
at the sentence level without the need for time annotations and uses features
derived from the audio, video and language transcriptions of its contents. We
evaluate our approach on two datasets and show that leveraging the video and
audio modalities consistently provides increased performance over text-only
baselines, providing evidence these extra modalities are key in better
understanding video reviews.Comment: Second Grand Challenge and Workshop on Multimodal Language ACL 202
Graph-based Facial Affect Analysis: A Review of Methods, Applications and Challenges
Facial affect analysis (FAA) using visual signals is important in
human-computer interaction. Early methods focus on extracting appearance and
geometry features associated with human affects, while ignoring the latent
semantic information among individual facial changes, leading to limited
performance and generalization. Recent work attempts to establish a graph-based
representation to model these semantic relationships and develop frameworks to
leverage them for various FAA tasks. In this paper, we provide a comprehensive
review of graph-based FAA, including the evolution of algorithms and their
applications. First, the FAA background knowledge is introduced, especially on
the role of the graph. We then discuss approaches that are widely used for
graph-based affective representation in literature and show a trend towards
graph construction. For the relational reasoning in graph-based FAA, existing
studies are categorized according to their usage of traditional methods or deep
models, with a special emphasis on the latest graph neural networks.
Performance comparisons of the state-of-the-art graph-based FAA methods are
also summarized. Finally, we discuss the challenges and potential directions.
As far as we know, this is the first survey of graph-based FAA methods. Our
findings can serve as a reference for future research in this field.Comment: 20 pages, 12 figures, 5 table
Label Mask for Multi-Label Text Classification
One of the key problems in multi-label text classification is how to take
advantage of the correlation among labels. However, it is very challenging to
directly model the correlations among labels in a complex and unknown label
space. In this paper, we propose a Label Mask multi-label text classification
model (LM-MTC), which is inspired by the idea of cloze questions of language
model. LM-MTC is able to capture implicit relationships among labels through
the powerful ability of pre-train language models. On the basis, we assign a
different token to each potential label, and randomly mask the token with a
certain probability to build a label based Masked Language Model (MLM). We
train the MTC and MLM together, further improving the generalization ability of
the model. A large number of experiments on multiple datasets demonstrate the
effectiveness of our method
The Emerging Trends of Multi-Label Learning
Exabytes of data are generated daily by humans, leading to the growing need
for new efforts in dealing with the grand challenges for multi-label learning
brought by big data. For example, extreme multi-label classification is an
active and rapidly growing research area that deals with classification tasks
with an extremely large number of classes or labels; utilizing massive data
with limited supervision to build a multi-label classification model becomes
valuable for practical applications, etc. Besides these, there are tremendous
efforts on how to harvest the strong learning capability of deep learning to
better capture the label dependencies in multi-label learning, which is the key
for deep learning to address real-world classification tasks. However, it is
noted that there has been a lack of systemic studies that focus explicitly on
analyzing the emerging trends and new challenges of multi-label learning in the
era of big data. It is imperative to call for a comprehensive survey to fulfill
this mission and delineate future research directions and new applications.Comment: Accepted to TPAMI 202
Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition
Automatic emotion recognition from speech, which is an important and challenging task in the field of affective computing, heavily relies on the effectiveness of the speech features for classification. Previous approaches to emotion recognition have mostly focused on the extraction of carefully hand-crafted features. How to model spatio-temporal dynamics for speech emotion recognition effectively is still under active investigation. In this paper, we propose a method to tackle the problem of emotional relevant feature extraction from speech by leveraging Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks with fully convolutional networks in order to automatically learn the best spatio-temporal representations of speech signals. The learned high-level features are then fed into a deep neural network (DNN) to predict the final emotion. The experimental results on the Chinese Natural Audio-Visual Emotion Database (CHEAVD) and the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpora show that our method provides more accurate predictions compared with other existing emotion recognition algorithms
Deep Emotion Recognition in Textual Conversations: A Survey
While Emotion Recognition in Conversations (ERC) has seen a tremendous
advancement in the last few years, new applications and implementation
scenarios present novel challenges and opportunities. These range from
leveraging the conversational context, speaker and emotion dynamics modelling,
to interpreting common sense expressions, informal language and sarcasm,
addressing challenges of real time ERC, recognizing emotion causes, different
taxonomies across datasets, multilingual ERC to interpretability. This survey
starts by introducing ERC, elaborating on the challenges and opportunities
pertaining to this task. It proceeds with a description of the emotion
taxonomies and a variety of ERC benchmark datasets employing such taxonomies.
This is followed by descriptions of the most prominent works in ERC with
explanations of the Deep Learning architectures employed. Then, it provides
advisable ERC practices towards better frameworks, elaborating on methods to
deal with subjectivity in annotations and modelling and methods to deal with
the typically unbalanced ERC datasets. Finally, it presents systematic review
tables comparing several works regarding the methods used and their
performance. The survey highlights the advantage of leveraging techniques to
address unbalanced data, the exploration of mixed emotions and the benefits of
incorporating annotation subjectivity in the learning phase
- …