3,728 research outputs found

    Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emotional Analysis in Videos

    Get PDF
    When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation

    Affective Recommendation of Movies Based on Selected Connotative Features

    Get PDF
    The apparent difficulty in assessing emotions elicited by movies and the undeniable high variability in subjects emotional responses to filmic content have been recently tackled by exploring film connotative properties: the set of shooting and editing conventions that help in transmitting meaning to the audience. Connotation provides an intermediate representation which exploits the objectivity of audiovisual descriptors to predict the subjective emotional reaction of single users. This is done without the need of registering users physiological signals neither by employing other people highly variable emotional rates, but just relying on the inter-subjectivity of connotative concepts and on the knowledge of users reactions to similar stimuli. This work extends previous by extracting audiovisual and film grammar descriptors and, driven by users rates on connotative properties, creates a shared framework where movie scenes are placed, compared and recommended according to connotation. We evaluate the potential of the proposed system by asking users to assess the ability of connotation in suggesting filmic content able to target their affective requests

    Automated generation of movie tributes

    Get PDF
    O objetivo desta tese é gerar um tributo a um filme sob a forma de videoclip, considerando como entrada um filme e um segmento musical coerente. Um tributo é considerado um vídeo que contém os clips mais significativos de um filme, reproduzidos sequencialmente, enquanto uma música toca. Nesta proposta, os clips a constar do tributo final são o resultado da sumarização das legendas do filme com um algoritmo de sumarização genérico. É importante que o artefacto seja coerente e fluido, pelo que há a necessidade de haver um equilíbrio entre a seleção de conteúdo importante e a seleção de conteúdo que esteja em harmonia com a música. Para tal, os clips são filtrados de forma a garantir que apenas aqueles que contêm a mesma emoção da música aparecem no vídeo final. Tal é feito através da extração de vetores de características áudio relacionadas com emoções das cenas às quais os clips pertencem e da música, e, de seguida, da sua comparação por meio do cálculo de uma medida de distância. Por fim, os clips filtrados preenchem a música cronologicamente. Os resultados foram positivos: em média, os tributos produzidos obtiveram 7 pontos, numa escala de 0 a 10, em critérios como seleção de conteúdo e coerência emocional, fruto de avaliação humana.This thesis’ purpose is to generate a movie tribute in the form of a videoclip for a given movie and music. A tribute is considered to be a video containing meaningful clips from the movie playing along with a cohesive music piece. In this work, we collect the clips by summarizing the movie subtitles with a generic summarization algorithm. It is important that the artifact is coherent and fluid, hence there is the need to balance between the selection of important content and the selection of content that is in harmony with the music. To achieve so, clips are filtered so as to ensure that only those that contain the same emotion as the music are chosen to appear in the final video. This is made by extracting vectors of emotion-related audio features from the scenes they belong to and from the music, and then comparing them with a distance measure. Finally, filtered clips fill the music length in a chronological order. Results were positive: on average, the produced tributes obtained scores of 7, on a scale from 0 to 10, on content selection, and emotional coherence criteria, from human evaluation

    What Is Important When We Evaluate Movies? Insights from Computational Analysis of Online Reviews

    Get PDF
    The question of what is important when we evaluate movies is crucial for understanding how lay audiences experience and evaluate entertainment products such as films. In line with this, subjective movie evaluation criteria (SMEC) have been conceptualized as mental representations of important attitudes toward specific film features. Based on exploratory and confirmatory factor analyses of self-report data from online surveys, previous research has found and validated eight dimensions. Given the large-scale evaluative information that is available in online users’ comments in movie databases, it seems likely that what online users write about movies may enrich our knowledge about SMEC. As a first fully exploratory attempt, drawing on an open-source dataset including movie reviews from IMDb, we estimated a correlated topic model to explore the underlying topics of those reviews. In 35,136 online movie reviews, the most prevalent topics tapped into three major categories—Hedonism, Actors’ Performance, and Narrative—and indicated what reviewers mostly wrote about. Although a qualitative analysis of the reviews revealed that users mention certain SMEC, results of the topic model covered only two SMEC: Story Innovation and Light-heartedness. Implications for SMEC and entertainment research are discussed

    Context-aware Horror Video Scene Recognition via Cost-sensitive Sparse Coding

    Get PDF
    Abstract Along with the ever-growing Web, horror video sharin

    Using Compressed Audio-visual Words for Multi-modal Scene Classification

    Get PDF
    We present a novel approach to scene classification using combined audio signal and video image features and compare this methodology to scene classification results using each modality in isolation. Each modality is represented using summary features, namely Mel-frequency Cepstral Coefficients (audio) and Scale Invariant Feature Transform (SIFT) (video) within a multi-resolution bag-of-features model. Uniquely, we extend the classical bag-of-words approach over both audio and video feature spaces, whereby we introduce the concept of compressive sensing as a novel methodology for multi-modal fusion via audio-visual feature dimensionality reduction. We perform evaluation over a range of environments showing performance that is both comparable to the state of the art (86%, over ten scene classes) and invariant to a ten-fold dimensionality reduction within the audio-visual feature space using our compressive representation approach

    A Connotative Space for Supporting Movie Affective Recommendation

    Get PDF
    The problem of relating media content to users’affective responses is here addressed. Previous work suggests that a direct mapping of audio-visual properties into emotion categories elicited by films is rather difficult, due to the high variability of individual reactions. To reduce the gap between the objective level of video features and the subjective sphere of emotions, we propose to shift the representation towards the connotative properties of movies, in a space inter-subjectively shared among users. Consequently, the connotative space allows to define, relate and compare affective descriptions of film videos on equal footing. An extensive test involving a significant number of users watching famous movie scenes, suggests that the connotative space can be related to affective categories of a single user. We apply this finding to reach high performance in meeting user’s emotional preferences

    WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM

    Get PDF
    Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments
    • …
    corecore