8,920 research outputs found

    Developing an Affect-Aware Rear-Projected Robotic Agent

    Get PDF
    Social (or Sociable) robots are designed to interact with people in a natural and interpersonal manner. They are becoming an integrated part of our daily lives and have achieved positive outcomes in several applications such as education, health care, quality of life, entertainment, etc. Despite significant progress towards the development of realistic social robotic agents, a number of problems remain to be solved. First, current social robots either lack enough ability to have deep social interaction with human, or they are very expensive to build and maintain. Second, current social robots have yet to reach the full emotional and social capabilities necessary for rich and robust interaction with human beings. To address these problems, this dissertation presents the development of a low-cost, flexible, affect-aware rear-projected robotic agent (called ExpressionBot), that is designed to support verbal and non-verbal communication between the robot and humans, with the goal of closely modeling the dynamics of natural face-to-face communication. The developed robotic platform uses state-of-the-art character animation technologies to create an animated human face (aka avatar) that is capable of showing facial expressions, realistic eye movement, and accurate visual speech, and then project this avatar onto a face-shaped translucent mask. The mask and the projector are then rigged onto a neck mechanism that can move like a human head. Since an animation is projected onto a mask, the robotic face is highly flexible research tool, mechanically simple, and low-cost to design, build and maintain compared with mechatronic and android faces. The results of our comprehensive Human-Robot Interaction (HRI) studies illustrate the benefits and values of the proposed rear-projected robotic platform over a virtual-agent with the same animation displayed on a 2D computer screen. The results indicate that ExpressionBot is well accepted by users, with some advantages in expressing facial expressions more accurately and perceiving mutual eye gaze contact. To improve social capabilities of the robot and create an expressive and empathic social agent (affect-aware) which is capable of interpreting users\u27 emotional facial expressions, we developed a new Deep Neural Networks (DNN) architecture for Facial Expression Recognition (FER). The proposed DNN was initially trained on seven well-known publicly available databases, and obtained significantly better than, or comparable to, traditional convolutional neural networks or other state-of-the-art methods in both accuracy and learning time. Since the performance of the automated FER system highly depends on its training data, and the eventual goal of the proposed robotic platform is to interact with users in an uncontrolled environment, a database of facial expressions in the wild (called AffectNet) was created by querying emotion-related keywords from different search engines. AffectNet contains more than 1M images with faces and 440,000 manually annotated images with facial expressions, valence, and arousal. Two DNNs were trained on AffectNet to classify the facial expression images and predict the value of valence and arousal. Various evaluation metrics show that our deep neural network approaches trained on AffectNet can perform better than conventional machine learning methods and available off-the-shelf FER systems. We then integrated this automated FER system into spoken dialog of our robotic platform to extend and enrich the capabilities of ExpressionBot beyond spoken dialog and create an affect-aware robotic agent that can measure and infer users\u27 affect and cognition. Three social/interaction aspects (task engagement, being empathic, and likability of the robot) are measured in an experiment with the affect-aware robotic agent. The results indicate that users rated our affect-aware agent as empathic and likable as a robot in which user\u27s affect is recognized by a human (WoZ). In summary, this dissertation presents the development and HRI studies of a perceptive, and expressive, conversational, rear-projected, life-like robotic agent (aka ExpressionBot or Ryan) that models natural face-to-face communication between human and emapthic agent. The results of our in-depth human-robot-interaction studies show that this robotic agent can serve as a model for creating the next generation of empathic social robots

    Sensing, interpreting, and anticipating human social behaviour in the real world

    Get PDF
    Low-level nonverbal social signals like glances, utterances, facial expressions and body language are central to human communicative situations and have been shown to be connected to important high-level constructs, such as emotions, turn-taking, rapport, or leadership. A prerequisite for the creation of social machines that are able to support humans in e.g. education, psychotherapy, or human resources is the ability to automatically sense, interpret, and anticipate human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging. Furthermore, anticipation of nonverbal behaviour in social situations is still largely unexplored. The goal of this thesis is to move closer to the vision of social machines in the real world. It makes fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, robust recognition of low-level nonverbal behaviour lays the groundwork for all further analysis steps. Advancing human visual behaviour sensing is especially relevant as the current state of the art is still not satisfactory in many daily-life situations. While many social interactions take place in groups, current methods for unsupervised eye contact detection can only handle dyadic interactions. We propose a novel unsupervised method for multi-person eye contact detection by exploiting the connection between gaze and speaking turns. Furthermore, we make use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, we improve the interpretation of social signals in terms of higher level social behaviours. In particular, we propose the first dataset and method for emotion recognition from bodily expressions of freely moving, unaugmented dyads. Furthermore, we are the first to study low rapport detection in group interactions, as well as investigating a cross-dataset evaluation setting for the emergent leadership detection task. Third, human visual behaviour is special because it functions as a social signal and also determines what a person is seeing at a given moment in time. Being able to anticipate human gaze opens up the possibility for machines to more seamlessly share attention with humans, or to intervene in a timely manner if humans are about to overlook important aspects of the environment. We are the first to propose methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.Blick, Gesichtsausdrücke, Körpersprache, oder Prosodie spielen als nonverbale Signale eine zentrale Rolle in menschlicher Kommunikation. Sie wurden durch vielzählige Studien mit wichtigen Konzepten wie Emotionen, Sprecherwechsel, Führung, oder der Qualität des Verhältnisses zwischen zwei Personen in Verbindung gebracht. Damit Menschen effektiv während ihres täglichen sozialen Lebens von Maschinen unterstützt werden können, sind automatische Methoden zur Erkennung, Interpretation, und Antizipation von nonverbalem Verhalten notwendig. Obwohl die bisherige Forschung in kontrollierten Studien zu ermutigenden Ergebnissen gekommen ist, bleibt die automatische Analyse nonverbalen Verhaltens in weniger kontrollierten Situationen eine Herausforderung. Darüber hinaus existieren kaum Untersuchungen zur Antizipation von nonverbalem Verhalten in sozialen Situationen. Das Ziel dieser Arbeit ist, die Vision vom automatischen Verstehen sozialer Situationen ein Stück weit mehr Realität werden zu lassen. Diese Arbeit liefert wichtige Beiträge zur autmatischen Erkennung menschlichen Blickverhaltens in alltäglichen Situationen. Obwohl viele soziale Interaktionen in Gruppen stattfinden, existieren unüberwachte Methoden zur Augenkontakterkennung bisher lediglich für dyadische Interaktionen. Wir stellen einen neuen Ansatz zur Augenkontakterkennung in Gruppen vor, welcher ohne manuelle Annotationen auskommt, indem er sich den statistischen Zusammenhang zwischen Blick- und Sprechverhalten zu Nutze macht. Tägliche Aktivitäten sind eine Herausforderung für Geräte zur mobile Augenbewegungsmessung, da Verschiebungen dieser Geräte zur Verschlechterung ihrer Kalibrierung führen können. In dieser Arbeit verwenden wir Nutzerverhalten an mobilen Endgeräten, um den Effekt solcher Verschiebungen zu korrigieren. Neben der Erkennung verbessert diese Arbeit auch die Interpretation sozialer Signale. Wir veröffentlichen den ersten Datensatz sowie die erste Methode zur Emotionserkennung in dyadischen Interaktionen ohne den Einsatz spezialisierter Ausrüstung. Außerdem stellen wir die erste Studie zur automatischen Erkennung mangelnder Verbundenheit in Gruppeninteraktionen vor, und führen die erste datensatzübergreifende Evaluierung zur Detektion von sich entwickelndem Führungsverhalten durch. Zum Abschluss der Arbeit präsentieren wir die ersten Ansätze zur Antizipation von Blickverhalten in sozialen Interaktionen. Blickverhalten hat die besondere Eigenschaft, dass es sowohl als soziales Signal als auch der Ausrichtung der visuellen Wahrnehmung dient. Somit eröffnet die Fähigkeit zur Antizipation von Blickverhalten Maschinen die Möglichkeit, sich sowohl nahtloser in soziale Interaktionen einzufügen, als auch Menschen zu warnen, wenn diese Gefahr laufen wichtige Aspekte der Umgebung zu übersehen. Wir präsentieren Methoden zur Antizipation von Blickverhalten im Kontext der Interaktion mit mobilen Endgeräten während täglicher Aktivitäten, als auch während dyadischer Interaktionen mittels Videotelefonie

    Brain Computer Interfaces and Emotional Involvement: Theory, Research, and Applications

    Get PDF
    This reprint is dedicated to the study of brain activity related to emotional and attentional involvement as measured by Brain–computer interface (BCI) systems designed for different purposes. A BCI system can translate brain signals (e.g., electric or hemodynamic brain activity indicators) into a command to execute an action in the BCI application (e.g., a wheelchair, the cursor on the screen, a spelling device or a game). These tools have the advantage of having real-time access to the ongoing brain activity of the individual, which can provide insight into the user’s emotional and attentional states by training a classification algorithm to recognize mental states. The success of BCI systems in contemporary neuroscientific research relies on the fact that they allow one to “think outside the lab”. The integration of technological solutions, artificial intelligence and cognitive science allowed and will allow researchers to envision more and more applications for the future. The clinical and everyday uses are described with the aim to invite readers to open their minds to imagine potential further developments

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Ubiquitous Technologies for Emotion Recognition

    Get PDF
    Emotions play a very important role in how we think and behave. As such, the emotions we feel every day can compel us to act and influence the decisions and plans we make about our lives. Being able to measure, analyze, and better comprehend how or why our emotions may change is thus of much relevance to understand human behavior and its consequences. Despite the great efforts made in the past in the study of human emotions, it is only now, with the advent of wearable, mobile, and ubiquitous technologies, that we can aim to sense and recognize emotions, continuously and in real time. This book brings together the latest experiences, findings, and developments regarding ubiquitous sensing, modeling, and the recognition of human emotions

    Multimodal Human Group Behavior Analysis

    Get PDF
    Human behaviors in a group setting involve a complex mixture of multiple modalities: audio, visual, linguistic, and human interactions. With the rapid progress of AI, automatic prediction and understanding of these behaviors is no longer a dream. In a negotiation, discovering human relationships and identifying the dominant person can be useful for decision making. In security settings, detecting nervous behaviors can help law enforcement agents spot suspicious people. In adversarial settings such as national elections and court defense, identifying persuasive speakers is a critical task. It is beneficial to build accurate machine learning (ML) models to predict such human group behaviors. There are two elements for successful prediction of group behaviors. The first is to design domain-specific features for each modality. Social and Psychological studies have uncovered various factors including both individual cues and group interactions, which inspire us to extract relevant features computationally. In particular, the group interaction modality plays an important role, since human behaviors influence each other through interactions in a group. Second, effective multimodal ML models are needed to align and integrate the different modalities for accurate predictions. However, most previous work ignored the group interaction modality. Moreover, they only adopt early fusion or late fusion to combine different modalities, which is not optimal. This thesis presents methods to train models taking multimodal inputs in group interaction videos, and to predict human group behaviors. First, we develop an ML algorithm to automatically predict human interactions from videos, which is the basis to extract interaction features and model group behaviors. Second, we propose a multimodal method to identify dominant people in videos from multiple modalities. Third, we study the nervousness in human behavior by a developing hybrid method: group interaction feature engineering combined with individual facial embedding learning. Last, we introduce a multimodal fusion framework that enables us to predict how persuasive speakers are. Overall, we develop one algorithm to extract group interactions and build three multimodal models to identify three kinds of human behavior in videos: dominance, nervousness and persuasion. The experiments demonstrate the efficacy of the methods and analyze the modality-wise contributions

    Affective Computing

    Get PDF
    This book provides an overview of state of the art research in Affective Computing. It presents new ideas, original results and practical experiences in this increasingly important research field. The book consists of 23 chapters categorized into four sections. Since one of the most important means of human communication is facial expression, the first section of this book (Chapters 1 to 7) presents a research on synthesis and recognition of facial expressions. Given that we not only use the face but also body movements to express ourselves, in the second section (Chapters 8 to 11) we present a research on perception and generation of emotional expressions by using full-body motions. The third section of the book (Chapters 12 to 16) presents computational models on emotion, as well as findings from neuroscience research. In the last section of the book (Chapters 17 to 22) we present applications related to affective computing

    A Framework for Students Profile Detection

    Get PDF
    Some of the biggest problems tackling Higher Education Institutions are students’ drop-out and academic disengagement. Physical or psychological disabilities, social-economic or academic marginalization, and emotional and affective problems, are some of the factors that can lead to it. This problematic is worsened by the shortage of educational resources, that can bridge the communication gap between the faculty staff and the affective needs of these students. This dissertation focus in the development of a framework, capable of collecting analytic data, from an array of emotions, affects and behaviours, acquired either by human observations, like a teacher in a classroom or a psychologist, or by electronic sensors and automatic analysis software, such as eye tracking devices, emotion detection through facial expression recognition software, automatic gait and posture detection, and others. The framework establishes the guidance to compile the gathered data in an ontology, to enable the extraction of patterns outliers via machine learning, which assist the profiling of students in critical situations, like disengagement, attention deficit, drop-out, and other sociological issues. Consequently, it is possible to set real-time alerts when these profiles conditions are detected, so that appropriate experts could verify the situation and employ effective procedures. The goal is that, by providing insightful real-time cognitive data and facilitating the profiling of the students’ problems, a faster personalized response to help the student is enabled, allowing academic performance improvements
    • …
    corecore