Search CORE

4 research outputs found

Recognizing Induced Emotions of Movie Audiences: Are Induced and Perceived Emotions the Same?

Author: Chanel G.
Kostoulas Theodoros
Lai C.
Lombardo P.
Moore J.
Muszynski M.
Pun T.
Tian L.
Publication venue
Publication date: 01/01/2017
Field of study

Predicting the emotional response of movie audi- ences to affective movie content is a challenging task in affective computing. Previous work has focused on using audiovisual movie content to predict movie induced emotions. However, the relationship between the audience’s perceptions of the affective movie content (perceived emotions) and the emotions evoked in the audience (induced emotions) remains unexplored. In this work, we address the relationship between perceived and in- duced emotions in movies, and identify features and modelling approaches effective for predicting movie induced emotions. First, we extend the LIRIS-ACCEDE database by annotating perceived emotions in a crowd-sourced manner, and find that perceived and induced emotions are not always consistent. Second, we show that dialogue events and aesthetic highlights are effective predictors of movie induced emotions. In addition to movie based features, we also study physiological and be- havioural measurements of audiences. Our experiments show that induced emotion recognition can benefit from including temporal context and from including multimodal information. Our study bridges the gap between affective content analysis and induced emotion prediction

Crossref

Edinburgh Research Explorer

Bournemouth University Research Online

Archive ouverte UNIGE

Recognizing Induced Emotions of Movie Audiences From Multimodal Information

Author: Chanel G.
Kostoulas Theodoros
Lai C.
Lombardo P.
Moore J. D.
Muszynski M.
Pun T.
Tian L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Recognizing emotional reactions of movie audiences to affective movie content is a challenging task in affective computing. Previous research on induced emotion recognition has mainly focused on using audio-visual movie content. Nevertheless, the relationship between the perceptions of the affective movie content (perceived emotions) and the emotions evoked in the audiences (induced emotions) is unexplored. In this work, we studied the relationship between perceived and induced emotions of movie audiences. Moreover, we investigated multimodal modelling approaches to predict movie induced emotions from movie content based features, as well as physiological and behavioral reactions of movie audiences. To carry out analysis of induced and perceived emotions, we first extended an existing database for movie affect analysis by annotating perceived emotions in a crowd-sourced manner. We find that perceived and induced emotions are not always consistent with each other. In addition, we show that perceived emotions, movie dialogues, and aesthetic highlights are discriminative for movie induced emotion recognition besides spectators’ physiological and behavioral reactions. Also, our experiments revealed that induced emotion recognition could benefit from including temporal information and performing multimodal fusion. Moreover, our work deeply investigated the gap between affective content analysis and induced emotion recognition by gaining insight into the relationships between aesthetic highlights, induced emotions, and perceived emotions

Crossref

Edinburgh Research Explorer

Bournemouth University Research Online

Archive ouverte UNIGE

Recognizing emotions in spoken dialogue with acoustic and lexical cues

Author: Tian Leimin
Publication venue: The University of Edinburgh
Publication date: 02/07/2018
Field of study

Automatic emotion recognition has long been a focus of Affective Computing. It has become increasingly apparent that awareness of human emotions in Human-Computer Interaction (HCI) is crucial for advancing related technologies, such as dialogue systems. However, performance of current automatic emotion recognition is disappointing compared to human performance. Current research on emotion recognition in spoken dialogue focuses on identifying better feature representations and recognition models from a data-driven point of view. The goal of this thesis is to explore how incorporating prior knowledge of human emotion recognition in the automatic model can improve state-of-the-art performance of automatic emotion recognition in spoken dialogue. Specifically, we study this by proposing knowledge-inspired features representing occurrences of disfluency and non-verbal vocalisation in speech, and by building a multimodal recognition model that combines acoustic and lexical features in a knowledge-inspired hierarchical structure. In our study, emotions are represented with the Arousal, Expectancy, Power, and Valence emotion dimensions. We build unimodal and multimodal emotion recognition models to study the proposed features and modelling approach, and perform emotion recognition on both spontaneous and acted dialogue. Psycholinguistic studies have suggested that DISfluency and Non-verbal Vocalisation (DIS-NV) in dialogue is related to emotions. However, these affective cues in spoken dialogue are overlooked by current automatic emotion recognition research. Thus, we propose features for recognizing emotions in spoken dialogue which describe five types of DIS-NV in utterances, namely filled pause, filler, stutter, laughter, and audible breath. Our experiments show that this small set of features is predictive of emotions. Our DIS-NV features achieve better performance than benchmark acoustic and lexical features for recognizing all emotion dimensions in spontaneous dialogue. Consistent with Psycholinguistic studies, the DIS-NV features are especially predictive of the Expectancy dimension of emotion, which relates to speaker uncertainty. Our study illustrates the relationship between DIS-NVs and emotions in dialogue, which contributes to Psycholinguistic understanding of them as well. Note that our DIS-NV features are based on manual annotations, yet our long-term goal is to apply our emotion recognition model to HCI systems. Thus, we conduct preliminary experiments on automatic detection of DIS-NVs, and on using automatically detected DIS-NV features for emotion recognition. Our results show that DIS-NVs can be automatically detected from speech with stable accuracy, and auto-detected DIS-NV features remain predictive of emotions in spontaneous dialogue. This suggests that our emotion recognition model can be applied to a fully automatic system in the future, and holds the potential to improve the quality of emotional interaction in current HCI systems. To study the robustness of the DIS-NV features, we conduct cross-corpora experiments on both spontaneous and acted dialogue. We identify how dialogue type influences the performance of DIS-NV features and emotion recognition models. DIS-NVs contain additional information beyond acoustic characteristics or lexical contents. Thus, we study the gain of modality fusion for emotion recognition with the DIS-NV features. Previous work combines different feature sets by fusing modalities at the same level using two types of fusion strategies: Feature-Level (FL) fusion, which concatenates feature sets before recognition; and Decision-Level (DL) fusion, which makes the final decision based on outputs of all unimodal models. However, features from different modalities may describe data at different time scales or levels of abstraction. Moreover, Cognitive Science research indicates that when perceiving emotions, humans make use of information from different modalities at different cognitive levels and time steps. Therefore, we propose a HierarchicaL (HL) fusion strategy for multimodal emotion recognition, which incorporates features that describe data at a longer time interval or which are more abstract at higher levels of its knowledge-inspired hierarchy. Compared to FL and DL fusion, HL fusion incorporates both inter- and intra-modality differences. Our experiments show that HL fusion consistently outperforms FL and DL fusion on multimodal emotion recognition in both spontaneous and acted dialogue. The HL model combining our DIS-NV features with benchmark acoustic and lexical features improves current performance of multimodal emotion recognition in spoken dialogue. To study how other emotion-related tasks of spoken dialogue can benefit from the proposed approaches, we apply the DIS-NV features and the HL fusion strategy to recognize movie-induced emotions. Our experiments show that although designed for recognizing emotions in spoken dialogue, DIS-NV features and HL fusion remain effective for recognizing movie-induced emotions. This suggests that other emotion-related tasks can also benefit from the proposed features and model structure

Edinburgh Research Archive

Recommended from our members

Deep learning based facial expression recognition and its applications

Author: Jan Asim
Publication venue: Brunel University London
Publication date: 01/01/2017
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonFacial expression recognition (FER) is a research area that consists of classifying the human emotions through the expressions on their face. It can be used in applications such as biometric security, intelligent human-computer interaction, robotics, and clinical medicine for autism, depression, pain and mental health problems. This dissertation investigates the advanced technologies for facial expression analysis and develops the artificial intelligent systems for practical applications. The first part of this work applies geometric and texture domain feature extractors along with various machine learning techniques to improve FER. Advanced 2D and 3D facial processing techniques such as Edge Oriented Histograms (EOH) and Facial Mesh Distances (FMD) are then fused together using a framework designed to investigate their individual and combined domain performances. Following these tests, the face is then broken down into facial parts using advanced facial alignment and localising techniques. Deep learning in the form of Convolutional Neural Networks (CNNs) is also explored also FER. A novel approach is used for the deep network architecture design, to learn the facial parts jointly, showing an improvement over using the whole face. Joint Bayesian is also adapted in the form of metric learning, to work with deep feature representations of the facial parts. This provides a further improvement over using the deep network alone. Dynamic emotion content is explored as a solution to provide richer information than still images. The motion occurring across the content is initially captured using the Motion History Histogram descriptor (MHH) and is critically evaluated. Based on this observation, several improvements are proposed through extensions such as Average Spatial Pooling Multi-scale Motion History Histogram (ASMMHH). This extension adds two modifications, first is to view the content in different spatial dimensions through spatial pooling; influenced by the structure of CNNs. The other modification is to capture motion at different speeds. Combined, they have provided better performance over MHH, and other popular techniques like Local Binary Patterns – Three Orthogonal Planes (LBP-TOP). Finally, the dynamic emotion content is observed in the feature space, with sequences of images represented as sequences of extracted features. A novel technique called Facial Dynamic History Histogram (FDHH) is developed to capture patterns of variations within the sequence of features; an approach not seen before. FDHH is applied in an end to end framework for applications in Depression analysis and evaluating the induced emotions through a large set of video clips from various movies. With the combination of deep learning techniques and FDHH, state-of-the-art results are achieved for Depression analysis

Brunel University Research Archive