35 research outputs found

    Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emotional Analysis in Videos

    Get PDF
    When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation

    Affect Recognition in Ads with Application to Computational Advertising

    Get PDF
    Advertisements (ads) often include strongly emotional content to leave a lasting impression on the viewer. This work (i) compiles an affective ad dataset capable of evoking coherent emotions across users, as determined from the affective opinions of five experts and 14 annotators; (ii) explores the efficacy of convolutional neural network (CNN) features for encoding emotions, and observes that CNN features outperform low-level audio-visual emotion descriptors upon extensive experimentation; and (iii) demonstrates how enhanced affect prediction facilitates computational advertising, and leads to better viewing experience while watching an online video stream embedded with ads based on a study involving 17 users. We model ad emotions based on subjective human opinions as well as objective multimodal features, and show how effectively modeling ad emotions can positively impact a real-life application.Comment: Accepted at the ACM International Conference on Multimedia (ACM MM) 201

    Looking Beyond a Clever Narrative: Visual Context and Attention are Primary Drivers of Affect in Video Advertisements

    Full text link
    Emotion evoked by an advertisement plays a key role in influencing brand recall and eventual consumer choices. Automatic ad affect recognition has several useful applications. However, the use of content-based feature representations does not give insights into how affect is modulated by aspects such as the ad scene setting, salient object attributes and their interactions. Neither do such approaches inform us on how humans prioritize visual information for ad understanding. Our work addresses these lacunae by decomposing video content into detected objects, coarse scene structure, object statistics and actively attended objects identified via eye-gaze. We measure the importance of each of these information channels by systematically incorporating related information into ad affect prediction models. Contrary to the popular notion that ad affect hinges on the narrative and the clever use of linguistic and social cues, we find that actively attended objects and the coarse scene structure better encode affective information as compared to individual scene objects or conspicuous background elements.Comment: Accepted for publication in the Proceedings of 20th ACM International Conference on Multimodal Interaction, Boulder, CO, US

    Optimizing Player and Viewer Amusement in Suspense Video Games

    Get PDF
    Broadcast video games need to provide amusement to both players and audience. To achieve this, one of the most consumed genres is suspense, due to the psychological effects it has on both roles. Suspense is typically achieved in video games by controlling the amount of delivered information about the location of the threat. However, previous research suggests that players need more frequent information to reach similar amusement than viewers, even at the cost of jeopardizing viewers' engagement. In order to obtain models that maximize amusement for both interactive and passive audiences, we conducted an experiment in which a group of subjects played a suspenseful video game while another group watched it remotely. The subjects were asked to report their perceived suspense and amusement, and the data were used to obtain regression models for two common strategies to evoke suspense in video games: by alerting when the threat is approaching and by random circumstantial indications about the location of the threat. The results suggest that the optimal level is reached through randomly providing the minimal amount of information that still allows players to counteract the threat.We reckon that these results can be applied to a broad narrative media, beyond interactive games

    Who is the director of this movie? Automatic style recognition based on shot features

    Get PDF
    We show how low-level formal features, such as shot duration, meant as length of camera takes, and shot scale, i.e. the distance between the camera and the subject, are distinctive of a director's style in art movies. So far such features were thought of not having enough varieties to become distinctive of an author. However our investigation on the full filmographies of six different authors (Scorsese, Godard, Tarr, Fellini, Antonioni, and Bergman) for a total number of 120 movies analysed second by second, confirms that these shot-related features do not appear as random patterns in movies from the same director. For feature extraction we adopt methods based on both conventional and deep learning techniques. Our findings suggest that feature sequential patterns, i.e. how features evolve in time, are at least as important as the related feature distributions. To the best of our knowledge this is the first study dealing with automatic attribution of movie authorship, which opens up interesting lines of cross-disciplinary research on the impact of style on the aesthetic and emotional effects on the viewers

    Ranking highlight level of movie clips : a template based adaptive kernel SVM method

    Get PDF
    This paper looks into a new direction in movie clips analysis – model based ranking of highlight level. A movie clip, containing a short story, is composed of several continuous shots, which is much simpler than the whole movie. As a result, clip based analysis provides a feasible way for movie analysis and interpretation. In this paper, clip-based ranking of highlight level is proposed, where the challenging problem in detecting and recognizing events within clips is not required. Due to the lack of publicly available datasets, we firstly construct a database of movie clips, where each clip is associated with manually derived highlight level as ground truth. From each clip a number of effective visual cues are then extracted. To bridge the gap between low-level features and highlight level semantics, a holistic method of highlight ranking model is introduced. According to the distance between testing clips and selected templates, appropriate kernel function of support vector machine (SVM) is adaptively selected. Promising results are reported in automatic ranking of movie highlight levels
    corecore