1,381 research outputs found

    Continuous Analysis of Affect from Voice and Face

    Get PDF
    Human affective behavior is multimodal, continuous and complex. Despite major advances within the affective computing research field, modeling, analyzing, interpreting and responding to human affective behavior still remains a challenge for automated systems as affect and emotions are complex constructs, with fuzzy boundaries and with substantial individual differences in expression and experience [7]. Therefore, affective and behavioral computing researchers have recently invested increased effort in exploring how to best model, analyze and interpret the subtlety, complexity and continuity (represented along a continuum e.g., from −1 to +1) of affective behavior in terms of latent dimensions (e.g., arousal, power and valence) and appraisals, rather than in terms of a small number of discrete emotion categories (e.g., happiness and sadness). This chapter aims to (i) give a brief overview of the existing efforts and the major accomplishments in modeling and analysis of emotional expressions in dimensional and continuous space while focusing on open issues and new challenges in the field, and (ii) introduce a representative approach for multimodal continuous analysis of affect from voice and face, and provide experimental results using the audiovisual Sensitive Artificial Listener (SAL) Database of natural interactions. The chapter concludes by posing a number of questions that highlight the significant issues in the field, and by extracting potential answers to these questions from the relevant literature. The chapter is organized as follows. Section 10.2 describes theories of emotion, Sect. 10.3 provides details on the affect dimensions employed in the literature as well as how emotions are perceived from visual, audio and physiological modalities. Section 10.4 summarizes how current technology has been developed, in terms of data acquisition and annotation, and automatic analysis of affect in continuous space by bringing forth a number of issues that need to be taken into account when applying a dimensional approach to emotion recognition, namely, determining the duration of emotions for automatic analysis, modeling the intensity of emotions, determining the baseline, dealing with high inter-subject expression variation, defining optimal strategies for fusion of multiple cues and modalities, and identifying appropriate machine learning techniques and evaluation measures. Section 10.5 presents our representative system that fuses vocal and facial expression cues for dimensional and continuous prediction of emotions in valence and arousal space by employing the bidirectional Long Short-Term Memory neural networks (BLSTM-NN), and introduces an output-associative fusion framework that incorporates correlations between the emotion dimensions to further improve continuous affect prediction. Section 10.6 concludes the chapter

    Speakers Raise their Hands and Head during Self-Repairs in Dyadic Conversations

    Get PDF
    People often encounter difficulties in building shared understanding during everyday conversation. The most common symptom of these difficulties are self-repairs, when a speaker restarts, edits or amends their utterances mid-turn. Previous work has focused on the verbal signals of self-repair, i.e. speech disfluences (filled pauses, truncated words and phrases, word substitutions or reformulations), and computational tools now exist that can automatically detect these verbal phenomena. However, face-to-face conversation also exploits rich non-verbal resources and previous research suggests that self-repairs are associated with distinct hand movement patterns. This paper extends those results by exploring head and hand movements of both speakers and listeners using two motion parameters: height (vertical position) and 3D velocity. The results show that speech sequences containing self-repairs are distinguishable from fluent ones: speakers raise their hands and head more (and move more rapidly) during self-repairs. We obtain these results by analysing data from a corpus of 13 unscripted dialogues, and we discuss how these findings could support the creation of improved cognitive artificial systems for natural human-machine and human-robot interaction

    As You Are, So Shall You Move Your Head: A System-Level Analysis between Head Movements and Corresponding Traits and Emotions

    Full text link
    Identifying physical traits and emotions based on system-sensed physical activities is a challenging problem in the realm of human-computer interaction. Our work contributes in this context by investigating an underlying connection between head movements and corresponding traits and emotions. To do so, we utilize a head movement measuring device called eSense, which gives acceleration and rotation of a head. Here, first, we conduct a thorough study over head movement data collected from 46 persons using eSense while inducing five different emotional states over them in isolation. Our analysis reveals several new head movement based findings, which in turn, leads us to a novel unified solution for identifying different human traits and emotions through exploiting machine learning techniques over head movement data. Our analysis confirms that the proposed solution can result in high accuracy over the collected data. Accordingly, we develop an integrated unified solution for real-time emotion and trait identification using head movement data leveraging outcomes of our analysis.Comment: 9 pages, 7 figures, NSysS 201

    Robust Modeling of Epistemic Mental States and Their Applications in Assistive Technology

    Get PDF
    This dissertation presents the design and implementation of EmoAssist: Emotion-Enabled Assistive Tool to Enhance Dyadic Conversation for the Blind . The key functionalities of the system are to recognize behavioral expressions and to predict 3-D affective dimensions from visual cues and to provide audio feedback to the visually impaired in a natural environment. Prior to describing the EmoAssist, this dissertation identifies and advances research challenges in the analysis of the facial features and their temporal dynamics with Epistemic Mental States in dyadic conversation. A number of statistical analyses and simulations were performed to get the answer of important research questions about the complex interplay between facial features and mental states. It was found that the non-linear relations are mostly prevalent rather than the linear ones. Further, the portable prototype of assistive technology that can aid blind individual to understand his/her interlocutor\u27s mental states has been designed based on the analysis. A number of challenges related to the system, communication protocols, error-free tracking of face and robust modeling of behavioral expressions /affective dimensions were addressed to make the EmoAssist effective in a real world scenario. In addition, orientation-sensor information from the phone was used to correct image alignment to improve the robustness in real life deployment. It was observed that the EmoAssist can predict affective dimensions with acceptable accuracy (Maximum Correlation-Coefficient for valence: 0.76, arousal: 0.78, and dominance: 0.76) in natural conversation. The overall minimum and maximum response-times are (64.61 milliseconds) and (128.22 milliseconds), respectively. The integration of sensor information for correcting the orientation has helped in significant improvement (16% in average) of accuracy in recognizing behavioral expressions. A user study with ten blind people shows that the EmoAssist is highly acceptable to them (Average acceptability rating using Likert: 6.0 where 1 and 7 are the lowest and highest possible ratings, respectively) in social interaction

    Emotion sensing from head motion capture

    Get PDF
    Computational analysis of emotion from verbal and non-verbal behavioral cues is critical for human-centric intelligent systems. Among the non-verbal cues, head motion has received relatively less attention, although its importance has been noted in several research. We propose a new approach for emotion recognition using head motion captured using Motion Capture (MoCap). Our approach is motivated by the well known kinesics-phonetic analogy, which advocates that, analogous to human speech being composed of phonemes, head motion is composed of kinemes i.e., elementary motion units. We discover a set of kinemes from head motion in an unsupervised manner by projecting them onto a learned basis domain and subsequently clustering them. This transforms any head motion to a sequence of kinemes. Next, we learn the temporal latent structures within the kineme sequence pertaining to each emotion. For this purpose, we explore two separate approaches – one using Hidden Markov Model and another using artificial neural network. This class-specific, kineme-based representation of head motion is used to perform emotion recognition on the popular IEMOCAP database. We achieve high recognition accuracy (61.8% for three class) for various emotion recognition tasks using head motion alone. This work adds to our understanding of head motion dynamics, and has applications in emotion analysis and head motion animation and synthesis

    Continuous Interaction with a Virtual Human

    Get PDF
    Attentive Speaking and Active Listening require that a Virtual Human be capable of simultaneous perception/interpretation and production of communicative behavior. A Virtual Human should be able to signal its attitude and attention while it is listening to its interaction partner, and be able to attend to its interaction partner while it is speaking – and modify its communicative behavior on-the-fly based on what it perceives from its partner. This report presents the results of a four week summer project that was part of eNTERFACE’10. The project resulted in progress on several aspects of continuous interaction such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and models for appropriate reactions to listener responses. A pilot user study was conducted with ten participants. In addition, the project yielded a number of deliverables that are released for public access

    A system for recognizing human emotions based on speech analysis and facial feature extraction: applications to Human-Robot Interaction

    Get PDF
    With the advance in Artificial Intelligence, humanoid robots start to interact with ordinary people based on the growing understanding of psychological processes. Accumulating evidences in Human Robot Interaction (HRI) suggest that researches are focusing on making an emotional communication between human and robot for creating a social perception, cognition, desired interaction and sensation. Furthermore, robots need to receive human emotion and optimize their behavior to help and interact with a human being in various environments. The most natural way to recognize basic emotions is extracting sets of features from human speech, facial expression and body gesture. A system for recognition of emotions based on speech analysis and facial features extraction can have interesting applications in Human-Robot Interaction. Thus, the Human-Robot Interaction ontology explains how the knowledge of these fundamental sciences is applied in physics (sound analyses), mathematics (face detection and perception), philosophy theory (behavior) and robotic science context. In this project, we carry out a study to recognize basic emotions (sadness, surprise, happiness, anger, fear and disgust). Also, we propose a methodology and a software program for classification of emotions based on speech analysis and facial features extraction. The speech analysis phase attempted to investigate the appropriateness of using acoustic (pitch value, pitch peak, pitch range, intensity and formant), phonetic (speech rate) properties of emotive speech with the freeware program PRAAT, and consists of generating and analyzing a graph of speech signals. The proposed architecture investigated the appropriateness of analyzing emotive speech with the minimal use of signal processing algorithms. 30 participants to the experiment had to repeat five sentences in English (with durations typically between 0.40 s and 2.5 s) in order to extract data relative to pitch (value, range and peak) and rising-falling intonation. Pitch alignments (peak, value and range) have been evaluated and the results have been compared with intensity and speech rate. The facial feature extraction phase uses the mathematical formulation (B\ue9zier curves) and the geometric analysis of the facial image, based on measurements of a set of Action Units (AUs) for classifying the emotion. The proposed technique consists of three steps: (i) detecting the facial region within the image, (ii) extracting and classifying the facial features, (iii) recognizing the emotion. Then, the new data have been merged with reference data in order to recognize the basic emotion. Finally, we combined the two proposed algorithms (speech analysis and facial expression), in order to design a hybrid technique for emotion recognition. Such technique have been implemented in a software program, which can be employed in Human-Robot Interaction. The efficiency of the methodology was evaluated by experimental tests on 30 individuals (15 female and 15 male, 20 to 48 years old) form different ethnic groups, namely: (i) Ten adult European, (ii) Ten Asian (Middle East) adult and (iii) Ten adult American. Eventually, the proposed technique made possible to recognize the basic emotion in most of the cases

    Chapter From the Lab to the Real World: Affect Recognition Using Multiple Cues and Modalities

    Get PDF
    Interdisciplinary concept of dissipative soliton is unfolded in connection with ultrafast fibre lasers. The different mode-locking techniques as well as experimental realizations of dissipative soliton fibre lasers are surveyed briefly with an emphasis on their energy scalability. Basic topics of the dissipative soliton theory are elucidated in connection with concepts of energy scalability and stability. It is shown that the parametric space of dissipative soliton has reduced dimension and comparatively simple structure that simplifies the analysis and optimization of ultrafast fibre lasers. The main destabilization scenarios are described and the limits of energy scalability are connected with impact of optical turbulence and stimulated Raman scattering. The fast and slow dynamics of vector dissipative solitons are exposed

    Recognising Complex Mental States from Naturalistic Human-Computer Interactions

    Get PDF
    New advances in computer vision techniques will revolutionize the way we interact with computers, as they, together with other improvements, will help us build machines that understand us better. The face is the main non-verbal channel for human-human communication and contains valuable information about emotion, mood, and mental state. Affective computing researchers have investigated widely how facial expressions can be used for automatically recognizing affect and mental states. Nowadays, physiological signals can be measured by video-based techniques, which can also be utilised for emotion detection. Physiological signals, are an important indicator of internal feelings, and are more robust against social masking. This thesis focuses on computer vision techniques to detect facial expression and physiological changes for recognizing non-basic and natural emotions during human-computer interaction. It covers all stages of the research process from data acquisition, integration and application. Most previous studies focused on acquiring data from prototypic basic emotions acted out under laboratory conditions. To evaluate the proposed method under more practical conditions, two different scenarios were used for data collection. In the first scenario, a set of controlled stimulus was used to trigger the user’s emotion. The second scenario aimed at capturing more naturalistic emotions that might occur during a writing activity. In the second scenario, the engagement level of the participants with other affective states was the target of the system. For the first time this thesis explores how video-based physiological measures can be used in affect detection. Video-based measuring of physiological signals is a new technique that needs more improvement to be used in practical applications. A machine learning approach is proposed and evaluated to improve the accuracy of heart rate (HR) measurement using an ordinary camera during a naturalistic interaction with computer

    Recognising Complex Mental States from Naturalistic Human-Computer Interactions

    Get PDF
    New advances in computer vision techniques will revolutionize the way we interact with computers, as they, together with other improvements, will help us build machines that understand us better. The face is the main non-verbal channel for human-human communication and contains valuable information about emotion, mood, and mental state. Affective computing researchers have investigated widely how facial expressions can be used for automatically recognizing affect and mental states. Nowadays, physiological signals can be measured by video-based techniques, which can also be utilised for emotion detection. Physiological signals, are an important indicator of internal feelings, and are more robust against social masking. This thesis focuses on computer vision techniques to detect facial expression and physiological changes for recognizing non-basic and natural emotions during human-computer interaction. It covers all stages of the research process from data acquisition, integration and application. Most previous studies focused on acquiring data from prototypic basic emotions acted out under laboratory conditions. To evaluate the proposed method under more practical conditions, two different scenarios were used for data collection. In the first scenario, a set of controlled stimulus was used to trigger the user’s emotion. The second scenario aimed at capturing more naturalistic emotions that might occur during a writing activity. In the second scenario, the engagement level of the participants with other affective states was the target of the system. For the first time this thesis explores how video-based physiological measures can be used in affect detection. Video-based measuring of physiological signals is a new technique that needs more improvement to be used in practical applications. A machine learning approach is proposed and evaluated to improve the accuracy of heart rate (HR) measurement using an ordinary camera during a naturalistic interaction with computer
    corecore