17,526 research outputs found

    Audio-Visual Sentiment Analysis for Learning Emotional Arcs in Movies

    Full text link
    Stories can have tremendous power -- not only useful for entertainment, they can activate our interests and mobilize our actions. The degree to which a story resonates with its audience may be in part reflected in the emotional journey it takes the audience upon. In this paper, we use machine learning methods to construct emotional arcs in movies, calculate families of arcs, and demonstrate the ability for certain arcs to predict audience engagement. The system is applied to Hollywood films and high quality shorts found on the web. We begin by using deep convolutional neural networks for audio and visual sentiment analysis. These models are trained on both new and existing large-scale datasets, after which they can be used to compute separate audio and visual emotional arcs. We then crowdsource annotations for 30-second video clips extracted from highs and lows in the arcs in order to assess the micro-level precision of the system, with precision measured in terms of agreement in polarity between the system's predictions and annotators' ratings. These annotations are also used to combine the audio and visual predictions. Next, we look at macro-level characterizations of movies by investigating whether there exist `universal shapes' of emotional arcs. In particular, we develop a clustering approach to discover distinct classes of emotional arcs. Finally, we show on a sample corpus of short web videos that certain emotional arcs are statistically significant predictors of the number of comments a video receives. These results suggest that the emotional arcs learned by our approach successfully represent macroscopic aspects of a video story that drive audience engagement. Such machine understanding could be used to predict audience reactions to video stories, ultimately improving our ability as storytellers to communicate with each other.Comment: Data Mining (ICDM), 2017 IEEE 17th International Conference o

    Affect Recognition in Ads with Application to Computational Advertising

    Get PDF
    Advertisements (ads) often include strongly emotional content to leave a lasting impression on the viewer. This work (i) compiles an affective ad dataset capable of evoking coherent emotions across users, as determined from the affective opinions of five experts and 14 annotators; (ii) explores the efficacy of convolutional neural network (CNN) features for encoding emotions, and observes that CNN features outperform low-level audio-visual emotion descriptors upon extensive experimentation; and (iii) demonstrates how enhanced affect prediction facilitates computational advertising, and leads to better viewing experience while watching an online video stream embedded with ads based on a study involving 17 users. We model ad emotions based on subjective human opinions as well as objective multimodal features, and show how effectively modeling ad emotions can positively impact a real-life application.Comment: Accepted at the ACM International Conference on Multimedia (ACM MM) 201

    Movies and meaning: from low-level features to mind reading

    Get PDF
    When dealing with movies, closing the tremendous discontinuity between low-level features and the richness of semantics in the viewers' cognitive processes, requires a variety of approaches and different perspectives. For instance when attempting to relate movie content to users' affective responses, previous work suggests that a direct mapping of audio-visual properties into elicited emotions is difficult, due to the high variability of individual reactions. To reduce the gap between the objective level of features and the subjective sphere of emotions, we exploit the intermediate representation of the connotative properties of movies: the set of shooting and editing conventions that help in transmitting meaning to the audience. One of these stylistic feature, the shot scale, i.e. the distance of the camera from the subject, effectively regulates theory of mind, indicating that increasing spatial proximity to the character triggers higher occurrence of mental state references in viewers' story descriptions. Movies are also becoming an important stimuli employed in neural decoding, an ambitious line of research within contemporary neuroscience aiming at "mindreading". In this field we address the challenge of producing decoding models for the reconstruction of perceptual contents by combining fMRI data and deep features in a hybrid model able to predict specific video object classes

    Assessing the emotional impact of video using machine learning techniques

    Get PDF
    Typically, when a human being watches a video, different sensations and mind states can be stimulated. Among these, the sensation of fear can be triggered by watching segments of movies containing themes such as violence, horror and suspense. Both the audio and visual stimuli may contribute to induce fear onto the viewer. This dissertation studies the use of machine learning for forecasting the emotional effects triggered by video, more precisely, the automatic identification of fear inducing video segments. Using the LIRIS-ACCEDE dataset, several experiments have been performed in order to identify feature sets that are most relevant to the problem and to assess the performance of different machine learning classifiers. Both classical and deep learning techniques have been implemented and evaluated, using the Scikit-learn and TensorFlow machine learning libraries. Two different approaches for training and testing have been followed: film-level dataset splitting, where different films were used for training and testing; and sample-level dataset splitting, which allowed that different samples coming from the same films were used for training and testing. The prediction of movie segments that trigger fear sensations achieved a F1-score of 18.5% in the first approach, a value suggesting that the dataset does not adequately represent the universe of movies. The second approach achieved a F1-score of about 84.0%, a substantially higher value that shows promising outcomes when performing the proposed task.Quando o ser humano assiste a filmes, diferentes sensações e estados de espírito são despoletados. Entre estes encontra-se o medo, que pode ser despoletado através da visualização de excertos de filmes contendo, por exemplo, violência gráfica, horror ou suspense. Tanto a componente visual como a auditiva contribuem para o despoletar desta sensação. Nesta dissertação é analisada a utilização de aprendizagem automática para prever o impacto emocional que a visualização de vídeos possa causar nas pessoas, mais concretamente os segmentos de um filme que despoletam a sensação de medo. Foram realizadas diversas experiências usando o conjunto de dados LIRIS-ACCEDE com os objetivos de encontrar conjuntos de atributos de imagem e áudio com maior relevância para o problema e de avaliar o desempenho de diversos modelos de aprendizagem automática usados para classificação. Foram usados diversos algoritmos clássicos e de aprendizagem profunda, recorrendo-se às bibliotecas Scikit-learn e TensorFlow. No que se refere à separação dos dados usados para treino e teste foram seguidas duas abordagens: divisão dos dados ao nível do filme, sendo usados filmes distintos para treino e teste; e divisão dos dados ao nível da amostra, possibilitando que os conjuntos de treino e teste contenham amostras distintas, mas pertencentes aos mesmos filmes. Para previsão dos segmentos que despoletam medo, na primeira abordagem chegou-se a um resultado de F1-score de 18,5%, concluindo-se que o conjunto de dados usado não é representativo, e na segunda abordagem a um F1-score de 84,0%, um valor substancialmente mais alto e promissor no desempenho da tarefa proposta
    corecore