21 research outputs found
Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction
Visual media are powerful means of expressing emotions and sentiments. The
constant generation of new content in social networks highlights the need of
automated visual sentiment analysis tools. While Convolutional Neural Networks
(CNNs) have established a new state-of-the-art in several vision problems,
their application to the task of sentiment analysis is mostly unexplored and
there are few studies regarding how to design CNNs for this purpose. In this
work, we study the suitability of fine-tuning a CNN for visual sentiment
prediction as well as explore performance boosting techniques within this deep
learning setting. Finally, we provide a deep-dive analysis into a benchmark,
state-of-the-art network architecture to gain insight about how to design
patterns for CNNs on the task of visual sentiment prediction.Comment: Preprint of the paper accepted at the 1st Workshop on Affect and
Sentiment in Multimedia (ASM), in ACM MultiMedia 2015. Brisbane, Australi
Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark
Psychological research results have confirmed that people can have different
emotional reactions to different visual stimuli. Several papers have been
published on the problem of visual emotion analysis. In particular, attempts
have been made to analyze and predict people's emotional reaction towards
images. To this end, different kinds of hand-tuned features are proposed. The
results reported on several carefully selected and labeled small image data
sets have confirmed the promise of such features. While the recent successes of
many computer vision related tasks are due to the adoption of Convolutional
Neural Networks (CNNs), visual emotion analysis has not achieved the same level
of success. This may be primarily due to the unavailability of confidently
labeled and relatively large image data sets for visual emotion analysis. In
this work, we introduce a new data set, which started from 3+ million weakly
labeled images of different emotions and ended up 30 times as large as the
current largest publicly available visual emotion data set. We hope that this
data set encourages further research on visual emotion analysis. We also
perform extensive benchmarking analyses on this large data set using the state
of the art methods including CNNs.Comment: 7 pages, 7 figures, AAAI 201
Exploiting multimedia in creating and analysing multimedia Web archives
The data contained on the web and the social web are inherently multimedia and consist of a mixture of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. In many ways, the web is the greatest resource ever created by human-kind. However, due to the dynamic and distributed nature of the web, its content changes, appears and disappears on a daily basis. Web archiving provides a way of capturing snapshots of (parts of) the web for preservation and future analysis. This paper provides an overview of techniques we have developed within the context of the EU funded ARCOMEM (ARchiving COmmunity MEMories) project to allow multimedia web content to be leveraged during the archival process and for post-archival analysis. Through a set of use cases, we explore several practical applications of multimedia analytics within the realm of web archiving, web archive analysis and multimedia data on the web in general
Research on multi-modal sentiment feature learning of social media content
社交媒体已成为现代社会舆论交流和信息传递的主要平台。针对社交媒体的情感分析对于舆论监控、商业产品导向和股市预测等都具有重大应用价值。但社交媒体内容的多模态性(文本、图片等)让传统的单模态情感分析方法面临许多局限,多模态情感分析技术对跨媒体内容的理解与分析具有重大的理论价值。 多模态情感分析区别于单模态方法的关键问题在于,如何综合利用形态各异的多模态情感信息,来获取整体的情感倾向性,同时考虑单个模态本身在情感表达上的性质。针对该问题,利用社交媒体上的多模态内容在情感表达上所具有的关联性、抽象层级性的特点,提出了一套面向社交媒体的多模态情感特征学习与融合方法,实现多模态情感分析,主要内容和创新点...Social media has become a main platform of public communication and information transmission. Therefore, social media sentiment analysis has great application values in many fields, such as public opinion monitoring, production marking, stock forecasting and so on. But the multi-modal characteristic of social media content (e.g. texts and images) significantly challenges traditional text-based sen...学位:工学硕士院系专业:信息科学与技术学院_模式识别与智能系统学号:3152013115327
Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology
Every culture and language is unique. Our work expressly focuses on the
uniqueness of culture and language in relation to human affect, specifically
sentiment and emotion semantics, and how they manifest in social multimedia. We
develop sets of sentiment- and emotion-polarized visual concepts by adapting
semantic structures called adjective-noun pairs, originally introduced by Borth
et al. (2013), but in a multilingual context. We propose a new
language-dependent method for automatic discovery of these adjective-noun
constructs. We show how this pipeline can be applied on a social multimedia
platform for the creation of a large-scale multilingual visual sentiment
concept ontology (MVSO). Unlike the flat structure in Borth et al. (2013), our
unified ontology is organized hierarchically by multilingual clusters of
visually detectable nouns and subclusters of emotionally biased versions of
these nouns. In addition, we present an image-based prediction task to show how
generalizable language-specific models are in a multilingual context. A new,
publicly available dataset of >15.6K sentiment-biased visual concepts across 12
languages with language-specific detector banks, >7.36M images and their
metadata is also released.Comment: 11 pages, to appear at ACM MM'1
PDANet: Polarity-consistent Deep Attention Network for Fine-grained Visual Emotion Regression
Existing methods on visual emotion analysis mainly focus on coarse-grained
emotion classification, i.e. assigning an image with a dominant discrete
emotion category. However, these methods cannot well reflect the complexity and
subtlety of emotions. In this paper, we study the fine-grained regression
problem of visual emotions based on convolutional neural networks (CNNs).
Specifically, we develop a Polarity-consistent Deep Attention Network (PDANet),
a novel network architecture that integrates attention into a CNN with an
emotion polarity constraint. First, we propose to incorporate both spatial and
channel-wise attentions into a CNN for visual emotion regression, which jointly
considers the local spatial connectivity patterns along each channel and the
interdependency between different channels. Second, we design a novel
regression loss, i.e. polarity-consistent regression (PCR) loss, based on the
weakly supervised emotion polarity to guide the attention generation. By
optimizing the PCR loss, PDANet can generate a polarity preserved attention map
and thus improve the emotion regression performance. Extensive experiments are
conducted on the IAPS, NAPS, and EMOTIC datasets, and the results demonstrate
that the proposed PDANet outperforms the state-of-the-art approaches by a large
margin for fine-grained visual emotion regression. Our source code is released
at: https://github.com/ZizhouJia/PDANet.Comment: Accepted by ACM Multimedia 201
Horror image recognition based on context-aware multi-instance learning
Horror content sharing on the Web is a growing phenomenon that can interfere with our daily life and affect the mental health of those involved. As an important form of expression, horror images have their own characteristics that can evoke extreme emotions. In this paper, we present a novel context-aware multi-instance learning (CMIL) algorithm for horror image recognition. The CMIL algorithm identifies horror images and picks out the regions that cause the sensation of horror in these horror images. It obtains contextual cues among adjacent regions in an image using a random walk on a contextual graph. Borrowing the strength of the Fuzzy Support Vector Machine (FSVM), we define a heuristic optimization procedure based on the FSVM to search for the optimal classifier for the CMIL. To improve the initialization of the CMIL, we propose a novel visual saliency model based on tensor analysis. The average saliency value of each segmented region is set as its initial fuzzy membership in the CMIL. The advantage of the tensor-based visual saliency model is that it not only adaptively selects features, but also dynamically determines fusion weights for saliency value combination from different feature subspaces. The effectiveness of the proposed CMIL model is demonstrated by its use in horror image recognition on two large scale image sets collected from the Internet
Computational Emotion Analysis From Images: Recent Advances and Future Directions
Emotions are usually evoked in humans by images. Recently, extensive research
efforts have been dedicated to understanding the emotions of images. In this
chapter, we aim to introduce image emotion analysis (IEA) from a computational
perspective with the focus on summarizing recent advances and suggesting future
directions. We begin with commonly used emotion representation models from
psychology. We then define the key computational problems that the researchers
have been trying to solve and provide supervised frameworks that are generally
used for different IEA tasks. After the introduction of major challenges in
IEA, we present some representative methods on emotion feature extraction,
supervised classifier learning, and domain adaptation. Furthermore, we
introduce available datasets for evaluation and summarize some main results.
Finally, we discuss some open questions and future directions that researchers
can pursue.Comment: Accepted chapter in the book "Human Perception of Visual Information
Psychological and Computational Perspective