747 research outputs found
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
This paper describes InfoGAN, an information-theoretic extension to the
Generative Adversarial Network that is able to learn disentangled
representations in a completely unsupervised manner. InfoGAN is a generative
adversarial network that also maximizes the mutual information between a small
subset of the latent variables and the observation. We derive a lower bound to
the mutual information objective that can be optimized efficiently, and show
that our training procedure can be interpreted as a variation of the Wake-Sleep
algorithm. Specifically, InfoGAN successfully disentangles writing styles from
digit shapes on the MNIST dataset, pose from lighting of 3D rendered images,
and background digits from the central digit on the SVHN dataset. It also
discovers visual concepts that include hair styles, presence/absence of
eyeglasses, and emotions on the CelebA face dataset. Experiments show that
InfoGAN learns interpretable representations that are competitive with
representations learned by existing fully supervised methods
It's LeVAsa not LevioSA! Latent Encodings for Valence-Arousal Structure Alignment
In recent years, great strides have been made in the field of affective
computing. Several models have been developed to represent and quantify
emotions. Two popular ones include (i) categorical models which represent
emotions as discrete labels, and (ii) dimensional models which represent
emotions in a Valence-Arousal (VA) circumplex domain. However, there is no
standard for annotation mapping between the two labelling methods. We build a
novel algorithm for mapping categorical and dimensional model labels using
annotation transfer across affective facial image datasets. Further, we utilize
the transferred annotations to learn rich and interpretable data
representations using a variational autoencoder (VAE). We present "LeVAsa", a
VAE model that learns implicit structure by aligning the latent space with the
VA space. We evaluate the efficacy of LeVAsa by comparing performance with the
Vanilla VAE using quantitative and qualitative analysis on two benchmark
affective image datasets. Our results reveal that LeVAsa achieves high
latent-circumplex alignment which leads to improved downstream categorical
emotion prediction. The work also demonstrates the trade-off between degree of
alignment and quality of reconstructions.Comment: 5 pages, 4 figures and 3 table
Emotion Recognition by Video: A review
Video emotion recognition is an important branch of affective computing, and
its solutions can be applied in different fields such as human-computer
interaction (HCI) and intelligent medical treatment. Although the number of
papers published in the field of emotion recognition is increasing, there are
few comprehensive literature reviews covering related research on video emotion
recognition. Therefore, this paper selects articles published from 2015 to 2023
to systematize the existing trends in video emotion recognition in related
studies. In this paper, we first talk about two typical emotion models, then we
talk about databases that are frequently utilized for video emotion
recognition, including unimodal databases and multimodal databases. Next, we
look at and classify the specific structure and performance of modern unimodal
and multimodal video emotion recognition methods, talk about the benefits and
drawbacks of each, and then we compare them in detail in the tables. Further,
we sum up the primary difficulties right now looked by video emotion
recognition undertakings and point out probably the most encouraging future
headings, such as establishing an open benchmark database and better multimodal
fusion strategys. The essential objective of this paper is to assist scholarly
and modern scientists with keeping up to date with the most recent advances and
new improvements in this speedy, high-influence field of video emotion
recognition
Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives
Over the past few years, adversarial training has become an extremely active
research topic and has been successfully applied to various Artificial
Intelligence (AI) domains. As a potentially crucial technique for the
development of the next generation of emotional AI systems, we herein provide a
comprehensive overview of the application of adversarial training to affective
computing and sentiment analysis. Various representative adversarial training
algorithms are explained and discussed accordingly, aimed at tackling diverse
challenges associated with emotional AI systems. Further, we highlight a range
of potential future research directions. We expect that this overview will help
facilitate the development of adversarial training for affective computing and
sentiment analysis in both the academic and industrial communities
Hybrid Fusion Based Interpretable Multimodal Emotion Recognition with Limited Labelled Data
This paper proposes a multimodal emotion recognition system, VIsual Spoken
Textual Additive Net (VISTA Net), to classify emotions reflected by multimodal
input containing image, speech, and text into discrete classes. A new
interpretability technique, K-Average Additive exPlanation (KAAP), has also
been developed that identifies important visual, spoken, and textual features
leading to predicting a particular emotion class. The VISTA Net fuses
information from image, speech, and text modalities using a hybrid of early and
late fusion. It automatically adjusts the weights of their intermediate outputs
while computing the weighted average. The KAAP technique computes the
contribution of each modality and corresponding features toward predicting a
particular emotion class. To mitigate the insufficiency of multimodal emotion
datasets labeled with discrete emotion classes, we have constructed a
large-scale IIT-R MMEmoRec dataset consisting of images, corresponding speech
and text, and emotion labels ('angry,' 'happy,' 'hate,' and 'sad'). The VISTA
Net has resulted in 95.99\% emotion recognition accuracy on the IIT-R MMEmoRec
dataset on using visual, audio, and textual modalities, outperforming when
using any one or two modalities
Staticand Dynamic Facial Emotion Recognition Using Neural Network Models
Emotion recognition is the process of identifying human emotions. It is made
possible by processing various modalities including facial expressions, speech signals,
biometricsignals,etc. Withtheadvancementsincomputingtechnologies,FacialEmo
tion Recognition (FER) became important for several applications in which the user’s
emotional state is required, such as emotional training for autistic children. The recent
years witnessed a major leap in Artificial Intelligence(AI),specially neural networks for
computer vision applications. In this thesis, we investigate the application of AI algo
rithms for FER from static and dynamic data. Our experiments address the limitations
and challenges of previous works such as limited generalizability due to the datasets.
We compare the performance of machine learning classifiers and convolution neural
networks (CNNs) for FER from static data (images). Moreover, we study the perfor
mance of the proposed CNN for dynamic FER(videos),in addition to Long-ShortTerm
Memory(LSTM)inaCNN-LSTM hybrid approach to utilize the temporal information
in the videos. The proposed CNN architecture out performed the other classifiers with an
accuracy of 86.5%. It also outperformed the hybrid approach for dynamic FER which
achievedanaccuracyof74.6
- …