12 research outputs found
Automated curation of brand-related social media images with deep learning
This paper presents a work consisting in using deep convolutional neural networks (CNNs) to facilitate the curation of brand-related social media images. The final goal is to facilitate searching and discovering user-generated content (UGC) with potential value for digital marketing tasks. The images are captured in real time and automatically annotated with multiple CNNs. Some of the CNNs perform generic object recognition tasks while others perform what we call visual brand identity recognition. When appropriate, we also apply object detection, usually to discover images containing logos. We report experiments with 5 real brands in which more than 1 million real images were analyzed. In order to speed-up the training of custom CNNs we applied a transfer learning strategy. We examine the impact of different configurations and derive conclusions aiming to pave the way towards systematic and optimized methodologies for automatic UGC curation.Peer ReviewedPostprint (author's final draft
PDANet: Polarity-consistent Deep Attention Network for Fine-grained Visual Emotion Regression
Existing methods on visual emotion analysis mainly focus on coarse-grained
emotion classification, i.e. assigning an image with a dominant discrete
emotion category. However, these methods cannot well reflect the complexity and
subtlety of emotions. In this paper, we study the fine-grained regression
problem of visual emotions based on convolutional neural networks (CNNs).
Specifically, we develop a Polarity-consistent Deep Attention Network (PDANet),
a novel network architecture that integrates attention into a CNN with an
emotion polarity constraint. First, we propose to incorporate both spatial and
channel-wise attentions into a CNN for visual emotion regression, which jointly
considers the local spatial connectivity patterns along each channel and the
interdependency between different channels. Second, we design a novel
regression loss, i.e. polarity-consistent regression (PCR) loss, based on the
weakly supervised emotion polarity to guide the attention generation. By
optimizing the PCR loss, PDANet can generate a polarity preserved attention map
and thus improve the emotion regression performance. Extensive experiments are
conducted on the IAPS, NAPS, and EMOTIC datasets, and the results demonstrate
that the proposed PDANet outperforms the state-of-the-art approaches by a large
margin for fine-grained visual emotion regression. Our source code is released
at: https://github.com/ZizhouJia/PDANet.Comment: Accepted by ACM Multimedia 201
Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data
Detecting sentiments in natural language is tricky even for humans, making its automated detection more complicated. This research proffers a hybrid deep learning model for fine-grained sentiment prediction in real-time multimodal data. It reinforces the strengths of deep learning nets in combination to machine learning to deal with two specific semiotic systems, namely the textual (written text) and visual (still images) and their combination within the online content using decision level multimodal fusion. The proposed contextual ConvNet-SVMBoVW model, has four modules, namely, the discretization, text analytics, image analytics, and decision module. The input to the model is multimodal text, m ε {text, image, info-graphic}. The discretization module uses Google Lens to separate the text from the image, which is then processed as discrete entities and sent to the respective text analytics and image analytics modules. Text analytics module determines the sentiment using a hybrid of a convolution neural network (ConvNet) enriched with the contextual semantics of SentiCircle. An aggregation scheme is introduced to compute the hybrid polarity. A support vector machine (SVM) classifier trained using bag-of-visual-words (BoVW) for predicting the visual content sentiment. A Boolean decision module with a logical OR operation is augmented to the architecture which validates and categorizes the output on the basis of five fine-grained sentiment categories (truth values), namely ‘highly positive,’ ‘positive,’ ‘neutral,’ ‘negative’ and ‘highly negative.’ The accuracy achieved by the proposed model is nearly 91% which is an improvement over the accuracy obtained by the text and image modules individually
WhatsUp: An event resolution approach for co-occurring events in social media
The rapid growth of social media networks has resulted in the generation of a vast data amount, making it impractical to conduct manual analyses to extract newsworthy events. Thus, automated event detection mechanisms are invaluable to the community. However, a clear majority of the available approaches rely only on data statistics without considering linguistics. A few approaches involved linguistics, only to extract textual event details without the corresponding temporal details. Since linguistics define words’ structure and meaning, a severe information loss can happen without considering them. Targeting this limitation, we propose a novel method named WhatsUp to detect temporal and fine-grained textual event details, using linguistics captured by self-learned word embeddings and their hierarchical relationships and statistics captured by frequency-based measures. We evaluate our approach on recent social media data from two diverse domains and compare the performance with several state-of-the-art methods. Evaluations cover temporal and textual event aspects, and results show that WhatsUp notably outperforms state-of-the-art methods. We also analyse the efficiency, revealing that WhatsUp is sufficiently fast for (near) real-time detection. Further, the usage of unsupervised learning techniques, including self-learned embedding, makes our approach expandable to any language, platform and domain and provides capabilities to understand data-specific linguistics
Artificial Intelligence for Multimedia Signal Processing
Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining