13 research outputs found
Visual Emotion Recognition Using ResNet
Given an image, humans have emotional reactions to it such as happy, fear, disgust, etc. The purpose of this research is to classify images based on human's reaction to them using ResNet deep architecture. The problem is that emotional reaction from humans are subjective, therefore a confidently labelled dataset is difficult to obtain. This research tries to overcome this problem by implementing and analyzing transfer learning from a big dataset such as ImageNet to relatively small visual emotion dataset. Other than that, because emotion is determined by low-level and high-level features, we will make a modification to a pretrained residual network to better utilize low-level and high-level feature to be used in visual emotion recognition. Results show that general (low-level) features and specific (high-level) features obtained from ImageNet object recognition can be well utilized for visual emotion recognition
PDANet: Polarity-consistent Deep Attention Network for Fine-grained Visual Emotion Regression
Existing methods on visual emotion analysis mainly focus on coarse-grained
emotion classification, i.e. assigning an image with a dominant discrete
emotion category. However, these methods cannot well reflect the complexity and
subtlety of emotions. In this paper, we study the fine-grained regression
problem of visual emotions based on convolutional neural networks (CNNs).
Specifically, we develop a Polarity-consistent Deep Attention Network (PDANet),
a novel network architecture that integrates attention into a CNN with an
emotion polarity constraint. First, we propose to incorporate both spatial and
channel-wise attentions into a CNN for visual emotion regression, which jointly
considers the local spatial connectivity patterns along each channel and the
interdependency between different channels. Second, we design a novel
regression loss, i.e. polarity-consistent regression (PCR) loss, based on the
weakly supervised emotion polarity to guide the attention generation. By
optimizing the PCR loss, PDANet can generate a polarity preserved attention map
and thus improve the emotion regression performance. Extensive experiments are
conducted on the IAPS, NAPS, and EMOTIC datasets, and the results demonstrate
that the proposed PDANet outperforms the state-of-the-art approaches by a large
margin for fine-grained visual emotion regression. Our source code is released
at: https://github.com/ZizhouJia/PDANet.Comment: Accepted by ACM Multimedia 201
Computational Emotion Analysis From Images: Recent Advances and Future Directions
Emotions are usually evoked in humans by images. Recently, extensive research
efforts have been dedicated to understanding the emotions of images. In this
chapter, we aim to introduce image emotion analysis (IEA) from a computational
perspective with the focus on summarizing recent advances and suggesting future
directions. We begin with commonly used emotion representation models from
psychology. We then define the key computational problems that the researchers
have been trying to solve and provide supervised frameworks that are generally
used for different IEA tasks. After the introduction of major challenges in
IEA, we present some representative methods on emotion feature extraction,
supervised classifier learning, and domain adaptation. Furthermore, we
introduce available datasets for evaluation and summarize some main results.
Finally, we discuss some open questions and future directions that researchers
can pursue.Comment: Accepted chapter in the book "Human Perception of Visual Information
Psychological and Computational Perspective
APSE: Attention-aware polarity-sensitive embedding for emotion-based image retrieval
With the popularity of social media, an increasing number of people are accustomed to expressing their feelings and emotions online using images and videos. An emotion-based image retrieval (EBIR) system is useful for obtaining visual contents with desired emotions from a massive repository. Existing EBIR methods mainly focus on modeling the global characteristics of visual content without considering the crucial role of informative regions of interest in conveying emotions. Further, they ignore the hierarchical relationships between coarse polarities and fine categories of emotions. In this paper, we design an attention-aware polarity-sensitive embedding (APSE) network to address these issues. First, we develop a hierarchical attention mechanism to automatically discover and model the informative regions of interest. Specifically, both polarity-and emotion-specific attended representations are aggregated for discriminative feature embedding. Second, we propose a generated emotion-pair (GEP) loss to simultaneously consider the inter-and intra-polarity relationships of the emotion labels. Moreover, we adaptively generate negative examples of different hard levels in the feature space guided by the attention module to further improve the performance of feature embedding. Extensive experiments on four popular benchmark datasets demonstrate that the proposed APSE method outperforms the state-of-the-art EBIR approaches by a large margin
WSCNet: Weakly Supervised Coupled Networks for Visual Sentiment Classification and Detection
Automatic assessment of sentiment from visual content has gained considerable attention with the increasing tendency of expressing opinions online. In this paper, we solve the problem of visual sentiment analysis, which is challenging due to the high-level abstraction in the recognition process. Existing methods based on convolutional neural networks learn sentiment representations from the holistic image, despite the fact that different image regions can have different influence on the evoked sentiment. In this paper, we introduce a weakly supervised coupled convolutional network (WSCNet). Our method is dedicated to automatically selecting relevant soft proposals from weak annotations (e.g., global image labels), thereby significantly reducing the annotation burden, and encompasses the following contributions. First, WSCNet detects a sentiment-specific soft map by training a fully convolutional network with the cross spatial pooling strategy in the detection branch. Second, both the holistic and localized information are utilized by coupling the sentiment map with deep features for robust representation in the classification branch. We integrate the sentiment detection and classification branches into a unified deep framework, and optimize the network in an end-to-end way. Through this joint learning strategy, weakly supervised sentiment classification and detection benefit each other. Extensive experiments demonstrate that the proposed WSCNet outperforms the state-of-the-art results on seven benchmark datasets
Affective Image Content Analysis: Two Decades Review and New Perspectives
Images can convey rich semantics and induce various emotions in viewers.
Recently, with the rapid advancement of emotional intelligence and the
explosive growth of visual data, extensive research efforts have been dedicated
to affective image content analysis (AICA). In this survey, we will
comprehensively review the development of AICA in the recent two decades,
especially focusing on the state-of-the-art methods with respect to three main
challenges -- the affective gap, perception subjectivity, and label noise and
absence. We begin with an introduction to the key emotion representation models
that have been widely employed in AICA and description of available datasets
for performing evaluation with quantitative comparison of label noise and
dataset bias. We then summarize and compare the representative approaches on
(1) emotion feature extraction, including both handcrafted and deep features,
(2) learning methods on dominant emotion recognition, personalized emotion
prediction, emotion distribution learning, and learning from noisy data or few
labels, and (3) AICA based applications. Finally, we discuss some challenges
and promising research directions in the future, such as image content and
context understanding, group emotion clustering, and viewer-image interaction.Comment: Accepted by IEEE TPAM