Search CORE

20,318 research outputs found

Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emotional Analysis in Videos

Author: Acar Esra
Albayrak Sahin
Hopfgartner Frank
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2015
Field of study

When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation

Crossref

Enlighten

Looking Beyond a Clever Narrative: Visual Context and Attention are Primary Drivers of Affect in Video Advertisements

Author: Kankanhalli Mohan
Katti Harish
Shukla Abhinav
Subramanian Ramanathan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/08/2018
Field of study

Emotion evoked by an advertisement plays a key role in influencing brand recall and eventual consumer choices. Automatic ad affect recognition has several useful applications. However, the use of content-based feature representations does not give insights into how affect is modulated by aspects such as the ad scene setting, salient object attributes and their interactions. Neither do such approaches inform us on how humans prioritize visual information for ad understanding. Our work addresses these lacunae by decomposing video content into detected objects, coarse scene structure, object statistics and actively attended objects identified via eye-gaze. We measure the importance of each of these information channels by systematically incorporating related information into ad affect prediction models. Contrary to the popular notion that ad affect hinges on the narrative and the clever use of linguistic and social cues, we find that actively attended objects and the coarse scene structure better encode affective information as compared to individual scene objects or conspicuous background elements.Comment: Accepted for publication in the Proceedings of 20th ACM International Conference on Multimodal Interaction, Boulder, CO, US

arXiv.org e-Print Archive

University of Canberra Research Repository

Open Access Repository of IISc Research Publications

Recommended from our members

A grounded theory of young tennis players’ use of music to manipulate emotional state

Author: Bishop DT
Karageorghis CI
Loizou G
Publication venue: 'Human Kinetics'
Publication date: 01/10/2007
Field of study

The main objectives of this study were (a) to elucidate young tennis players’ use of music to manipulate emotional states, and (b) to present a model grounded in present data to illustrate this phenomenon and to stimulate further research. Anecdotal evidence suggests that music listening is used regularly by elite athletes as a preperformance strategy, but only limited empirical evidence corroborates such use. Young tennis players (N = 14) were selected purposively for interview and diary data collection. Results indicated that participants consciously selected music to elicit various emotional states; frequently reported consequences of music listening included improved mood, increased arousal, and visual and auditory imagery. The choice of music tracks and the impact of music listening were mediated by a number of factors, including extramusical associations, inspirational lyrics, music properties, and desired emotional state. Implications for the future investigation of preperformance music are discussed

Brunel University Research Archive

Affective belongings across geographies: locating YouTube viewing practices of Moroccan-Dutch youth

Author: de Haan Mariëtte
Leander Kevin
Leurs Koen
Publication venue: 'Informa UK Limited'
Publication date: 07/10/2015
Field of study

LSE Research Online

Utrecht University Repository

The Emotional Impact of Audio - Visual Stimuli

Author: Thomas Titus Pallithottathu
Publication venue: RIT Scholar Works
Publication date: 01/07/2017
Field of study

Induced affect is the emotional effect of an object on an individual. It can be quantiﬁed through two metrics: valence and arousal. Valance quantifies how positive or negative something is, while arousal quantifies the intensity from calm to exciting. These metrics enable researchers to study how people opine on various topics. Affective content analysis of visual media is a challenging problem due to differences in perceived reactions. Industry standard machine learning classifiers such as Support Vector Machines can be used to help determine user affect. The best affect-annotated video datasets are often analyzed by feeding large amounts of visual and audio features through machine-learning algorithms. The goal is to maximize accuracy, with the hope that each feature will bring useful information to the table. We depart from this approach to quantify how different modalities such as visual, audio, and text description information can aid in the understanding affect. To that end, we train independent models for visual, audio and text description. Each are convolutional neural networks paired with support vector machines to classify valence and arousal. We also train various ensemble models that combine multi-modal information with the hope that the information from independent modalities benefits each other. We ﬁnd that our visual network alone achieves state-of-the-art valence classiﬁcation accuracy and that our audio network, when paired with our visual, achieves competitive results on arousal classiﬁcation. Each network is much stronger on one metric than the other. This may lead to more sophisticated multimodal approaches to accurately identifying affect in video data. This work also contributes to induced emotion classification by augmenting existing sizable media datasets and providing a robust framework for classifying the same

RIT Scholar Works