17,526 research outputs found
Audio-Visual Sentiment Analysis for Learning Emotional Arcs in Movies
Stories can have tremendous power -- not only useful for entertainment, they
can activate our interests and mobilize our actions. The degree to which a
story resonates with its audience may be in part reflected in the emotional
journey it takes the audience upon. In this paper, we use machine learning
methods to construct emotional arcs in movies, calculate families of arcs, and
demonstrate the ability for certain arcs to predict audience engagement. The
system is applied to Hollywood films and high quality shorts found on the web.
We begin by using deep convolutional neural networks for audio and visual
sentiment analysis. These models are trained on both new and existing
large-scale datasets, after which they can be used to compute separate audio
and visual emotional arcs. We then crowdsource annotations for 30-second video
clips extracted from highs and lows in the arcs in order to assess the
micro-level precision of the system, with precision measured in terms of
agreement in polarity between the system's predictions and annotators' ratings.
These annotations are also used to combine the audio and visual predictions.
Next, we look at macro-level characterizations of movies by investigating
whether there exist `universal shapes' of emotional arcs. In particular, we
develop a clustering approach to discover distinct classes of emotional arcs.
Finally, we show on a sample corpus of short web videos that certain emotional
arcs are statistically significant predictors of the number of comments a video
receives. These results suggest that the emotional arcs learned by our approach
successfully represent macroscopic aspects of a video story that drive audience
engagement. Such machine understanding could be used to predict audience
reactions to video stories, ultimately improving our ability as storytellers to
communicate with each other.Comment: Data Mining (ICDM), 2017 IEEE 17th International Conference o
Affect Recognition in Ads with Application to Computational Advertising
Advertisements (ads) often include strongly emotional content to leave a
lasting impression on the viewer. This work (i) compiles an affective ad
dataset capable of evoking coherent emotions across users, as determined from
the affective opinions of five experts and 14 annotators; (ii) explores the
efficacy of convolutional neural network (CNN) features for encoding emotions,
and observes that CNN features outperform low-level audio-visual emotion
descriptors upon extensive experimentation; and (iii) demonstrates how enhanced
affect prediction facilitates computational advertising, and leads to better
viewing experience while watching an online video stream embedded with ads
based on a study involving 17 users. We model ad emotions based on subjective
human opinions as well as objective multimodal features, and show how
effectively modeling ad emotions can positively impact a real-life application.Comment: Accepted at the ACM International Conference on Multimedia (ACM MM)
201
Movies and meaning: from low-level features to mind reading
When dealing with movies, closing the tremendous discontinuity between low-level features and the richness of semantics in the viewers' cognitive processes, requires a variety of approaches and different perspectives. For instance when attempting to relate movie content to users' affective
responses, previous work suggests that a direct mapping of audio-visual properties into elicited emotions is difficult, due to the high variability of individual reactions. To reduce the gap between the objective level of features and the subjective sphere of emotions, we exploit the intermediate
representation of the connotative properties of movies: the set of shooting and editing conventions that help in transmitting meaning to the audience. One of these stylistic feature, the shot scale, i.e. the distance of the camera from the subject, effectively regulates theory of mind, indicating
that increasing spatial proximity to the character triggers higher occurrence of mental state references in viewers' story descriptions. Movies are also becoming an important stimuli employed in neural decoding, an ambitious line of research within contemporary neuroscience aiming at "mindreading".
In this field we address the challenge of producing decoding models for the reconstruction of perceptual contents by combining fMRI data and deep features in a hybrid model able to predict specific video object classes
Assessing the emotional impact of video using machine learning techniques
Typically, when a human being watches a video, different sensations and mind states can be
stimulated. Among these, the sensation of fear can be triggered by watching segments of
movies containing themes such as violence, horror and suspense. Both the audio and visual
stimuli may contribute to induce fear onto the viewer. This dissertation studies the use of
machine learning for forecasting the emotional effects triggered by video, more precisely,
the automatic identification of fear inducing video segments.
Using the LIRIS-ACCEDE dataset, several experiments have been performed in order
to identify feature sets that are most relevant to the problem and to assess the performance
of different machine learning classifiers. Both classical and deep learning techniques have
been implemented and evaluated, using the Scikit-learn and TensorFlow machine learning
libraries. Two different approaches for training and testing have been followed: film-level
dataset splitting, where different films were used for training and testing; and sample-level
dataset splitting, which allowed that different samples coming from the same films were
used for training and testing. The prediction of movie segments that trigger fear sensations
achieved a F1-score of 18.5% in the first approach, a value suggesting that the dataset
does not adequately represent the universe of movies. The second approach achieved a
F1-score of about 84.0%, a substantially higher value that shows promising outcomes when
performing the proposed task.Quando o ser humano assiste a filmes, diferentes sensações e estados de espírito são
despoletados. Entre estes encontra-se o medo, que pode ser despoletado através da
visualização de excertos de filmes contendo, por exemplo, violência gráfica, horror ou
suspense. Tanto a componente visual como a auditiva contribuem para o despoletar desta
sensação. Nesta dissertação é analisada a utilização de aprendizagem automática para
prever o impacto emocional que a visualização de vídeos possa causar nas pessoas, mais
concretamente os segmentos de um filme que despoletam a sensação de medo.
Foram realizadas diversas experiências usando o conjunto de dados LIRIS-ACCEDE
com os objetivos de encontrar conjuntos de atributos de imagem e áudio com maior
relevância para o problema e de avaliar o desempenho de diversos modelos de
aprendizagem automática usados para classificação. Foram usados diversos algoritmos
clássicos e de aprendizagem profunda, recorrendo-se às bibliotecas Scikit-learn e
TensorFlow. No que se refere à separação dos dados usados para treino e teste foram
seguidas duas abordagens: divisão dos dados ao nível do filme, sendo usados filmes
distintos para treino e teste; e divisão dos dados ao nível da amostra, possibilitando que os
conjuntos de treino e teste contenham amostras distintas, mas pertencentes aos mesmos
filmes. Para previsão dos segmentos que despoletam medo, na primeira abordagem
chegou-se a um resultado de F1-score de 18,5%, concluindo-se que o conjunto de dados
usado não é representativo, e na segunda abordagem a um F1-score de 84,0%, um valor
substancialmente mais alto e promissor no desempenho da tarefa proposta
- …