15 research outputs found

    Building Embodied Conversational Agents:Observations on human nonverbal behaviour as a resource for the development of artificial characters

    Get PDF
    "Wow this is so cool!" This is what I most probably yelled, back in the 90s, when my first computer program on our MSX computer turned out to do exactly what I wanted it to do. The program contained the following instruction: COLOR 10(1.1) After hitting enter, it would change the screen color from light blue to dark yellow. A few years after that experience, Microsoft Windows was introduced. Windows came with an intuitive graphical user interface that was designed to allow all people, so also those who would not consider themselves to be experienced computer addicts, to interact with the computer. This was a major step forward in human-computer interaction, as from that point forward no complex programming skills were required anymore to perform such actions as adapting the screen color. Changing the background was just a matter of pointing the mouse to the desired color on a color palette. "Wow this is so cool!". This is what I shouted, again, 20 years later. This time my new smartphone successfully skipped to the next song on Spotify because I literally told my smartphone, with my voice, to do so. Being able to operate your smartphone with natural language through voice-control can be extremely handy, for instance when listening to music while showering. Again, the option to handle a computer with voice instructions turned out to be a significant optimization in human-computer interaction. From now on, computers could be instructed without the use of a screen, mouse or keyboard, and instead could operate successfully simply by telling the machine what to do. In other words, I have personally witnessed how, within only a few decades, the way people interact with computers has changed drastically, starting as a rather technical and abstract enterprise to becoming something that was both natural and intuitive, and did not require any advanced computer background. Accordingly, while computers used to be machines that could only be operated by technically-oriented individuals, they had gradually changed into devices that are part of many people’s household, just as much as a television, a vacuum cleaner or a microwave oven. The introduction of voice control is a significant feature of the newer generation of interfaces in the sense that these have become more "antropomorphic" and try to mimic the way people interact in daily life, where indeed the voice is a universally used device that humans exploit in their exchanges with others. The question then arises whether it would be possible to go even one step further, where people, like in science-fiction movies, interact with avatars or humanoid robots, whereby users can have a proper conversation with a computer-simulated human that is indistinguishable from a real human. An interaction with a human-like representation of a computer that behaves, talks and reacts like a real person would imply that the computer is able to not only produce and understand messages transmitted auditorily through the voice, but also could rely on the perception and generation of different forms of body language, such as facial expressions, gestures or body posture. At the time of writing, developments of this next step in human-computer interaction are in full swing, but the type of such interactions is still rather constrained when compared to the way humans have their exchanges with other humans. It is interesting to reflect on how such future humanmachine interactions may look like. When we consider other products that have been created in history, it sometimes is striking to see that some of these have been inspired by things that can be observed in our environment, yet at the same do not have to be exact copies of those phenomena. For instance, an airplane has wings just as birds, yet the wings of an airplane do not make those typical movements a bird would produce to fly. Moreover, an airplane has wheels, whereas a bird has legs. At the same time, an airplane has made it possible for a humans to cover long distances in a fast and smooth manner in a way that was unthinkable before it was invented. The example of the airplane shows how new technologies can have "unnatural" properties, but can nonetheless be very beneficial and impactful for human beings. This dissertation centers on this practical question of how virtual humans can be programmed to act more human-like. The four studies presented in this dissertation all have the equivalent underlying question of how parts of human behavior can be captured, such that computers can use it to become more human-like. Each study differs in method, perspective and specific questions, but they are all aimed to gain insights and directions that would help further push the computer developments of human-like behavior and investigate (the simulation of) human conversational behavior. The rest of this introductory chapter gives a general overview of virtual humans (also known as embodied conversational agents), their potential uses and the engineering challenges, followed by an overview of the four studies

    Building Embodied Conversational Agents:Observations on human nonverbal behaviour as a resource for the development of artificial characters

    Get PDF
    "Wow this is so cool!" This is what I most probably yelled, back in the 90s, when my first computer program on our MSX computer turned out to do exactly what I wanted it to do. The program contained the following instruction: COLOR 10(1.1) After hitting enter, it would change the screen color from light blue to dark yellow. A few years after that experience, Microsoft Windows was introduced. Windows came with an intuitive graphical user interface that was designed to allow all people, so also those who would not consider themselves to be experienced computer addicts, to interact with the computer. This was a major step forward in human-computer interaction, as from that point forward no complex programming skills were required anymore to perform such actions as adapting the screen color. Changing the background was just a matter of pointing the mouse to the desired color on a color palette. "Wow this is so cool!". This is what I shouted, again, 20 years later. This time my new smartphone successfully skipped to the next song on Spotify because I literally told my smartphone, with my voice, to do so. Being able to operate your smartphone with natural language through voice-control can be extremely handy, for instance when listening to music while showering. Again, the option to handle a computer with voice instructions turned out to be a significant optimization in human-computer interaction. From now on, computers could be instructed without the use of a screen, mouse or keyboard, and instead could operate successfully simply by telling the machine what to do. In other words, I have personally witnessed how, within only a few decades, the way people interact with computers has changed drastically, starting as a rather technical and abstract enterprise to becoming something that was both natural and intuitive, and did not require any advanced computer background. Accordingly, while computers used to be machines that could only be operated by technically-oriented individuals, they had gradually changed into devices that are part of many people’s household, just as much as a television, a vacuum cleaner or a microwave oven. The introduction of voice control is a significant feature of the newer generation of interfaces in the sense that these have become more "antropomorphic" and try to mimic the way people interact in daily life, where indeed the voice is a universally used device that humans exploit in their exchanges with others. The question then arises whether it would be possible to go even one step further, where people, like in science-fiction movies, interact with avatars or humanoid robots, whereby users can have a proper conversation with a computer-simulated human that is indistinguishable from a real human. An interaction with a human-like representation of a computer that behaves, talks and reacts like a real person would imply that the computer is able to not only produce and understand messages transmitted auditorily through the voice, but also could rely on the perception and generation of different forms of body language, such as facial expressions, gestures or body posture. At the time of writing, developments of this next step in human-computer interaction are in full swing, but the type of such interactions is still rather constrained when compared to the way humans have their exchanges with other humans. It is interesting to reflect on how such future humanmachine interactions may look like. When we consider other products that have been created in history, it sometimes is striking to see that some of these have been inspired by things that can be observed in our environment, yet at the same do not have to be exact copies of those phenomena. For instance, an airplane has wings just as birds, yet the wings of an airplane do not make those typical movements a bird would produce to fly. Moreover, an airplane has wheels, whereas a bird has legs. At the same time, an airplane has made it possible for a humans to cover long distances in a fast and smooth manner in a way that was unthinkable before it was invented. The example of the airplane shows how new technologies can have "unnatural" properties, but can nonetheless be very beneficial and impactful for human beings. This dissertation centers on this practical question of how virtual humans can be programmed to act more human-like. The four studies presented in this dissertation all have the equivalent underlying question of how parts of human behavior can be captured, such that computers can use it to become more human-like. Each study differs in method, perspective and specific questions, but they are all aimed to gain insights and directions that would help further push the computer developments of human-like behavior and investigate (the simulation of) human conversational behavior. The rest of this introductory chapter gives a general overview of virtual humans (also known as embodied conversational agents), their potential uses and the engineering challenges, followed by an overview of the four studies

    Ada and Grace: Direct Interaction with Museum Visitors

    Full text link

    Developing an Affect-Aware Rear-Projected Robotic Agent

    Get PDF
    Social (or Sociable) robots are designed to interact with people in a natural and interpersonal manner. They are becoming an integrated part of our daily lives and have achieved positive outcomes in several applications such as education, health care, quality of life, entertainment, etc. Despite significant progress towards the development of realistic social robotic agents, a number of problems remain to be solved. First, current social robots either lack enough ability to have deep social interaction with human, or they are very expensive to build and maintain. Second, current social robots have yet to reach the full emotional and social capabilities necessary for rich and robust interaction with human beings. To address these problems, this dissertation presents the development of a low-cost, flexible, affect-aware rear-projected robotic agent (called ExpressionBot), that is designed to support verbal and non-verbal communication between the robot and humans, with the goal of closely modeling the dynamics of natural face-to-face communication. The developed robotic platform uses state-of-the-art character animation technologies to create an animated human face (aka avatar) that is capable of showing facial expressions, realistic eye movement, and accurate visual speech, and then project this avatar onto a face-shaped translucent mask. The mask and the projector are then rigged onto a neck mechanism that can move like a human head. Since an animation is projected onto a mask, the robotic face is highly flexible research tool, mechanically simple, and low-cost to design, build and maintain compared with mechatronic and android faces. The results of our comprehensive Human-Robot Interaction (HRI) studies illustrate the benefits and values of the proposed rear-projected robotic platform over a virtual-agent with the same animation displayed on a 2D computer screen. The results indicate that ExpressionBot is well accepted by users, with some advantages in expressing facial expressions more accurately and perceiving mutual eye gaze contact. To improve social capabilities of the robot and create an expressive and empathic social agent (affect-aware) which is capable of interpreting users\u27 emotional facial expressions, we developed a new Deep Neural Networks (DNN) architecture for Facial Expression Recognition (FER). The proposed DNN was initially trained on seven well-known publicly available databases, and obtained significantly better than, or comparable to, traditional convolutional neural networks or other state-of-the-art methods in both accuracy and learning time. Since the performance of the automated FER system highly depends on its training data, and the eventual goal of the proposed robotic platform is to interact with users in an uncontrolled environment, a database of facial expressions in the wild (called AffectNet) was created by querying emotion-related keywords from different search engines. AffectNet contains more than 1M images with faces and 440,000 manually annotated images with facial expressions, valence, and arousal. Two DNNs were trained on AffectNet to classify the facial expression images and predict the value of valence and arousal. Various evaluation metrics show that our deep neural network approaches trained on AffectNet can perform better than conventional machine learning methods and available off-the-shelf FER systems. We then integrated this automated FER system into spoken dialog of our robotic platform to extend and enrich the capabilities of ExpressionBot beyond spoken dialog and create an affect-aware robotic agent that can measure and infer users\u27 affect and cognition. Three social/interaction aspects (task engagement, being empathic, and likability of the robot) are measured in an experiment with the affect-aware robotic agent. The results indicate that users rated our affect-aware agent as empathic and likable as a robot in which user\u27s affect is recognized by a human (WoZ). In summary, this dissertation presents the development and HRI studies of a perceptive, and expressive, conversational, rear-projected, life-like robotic agent (aka ExpressionBot or Ryan) that models natural face-to-face communication between human and emapthic agent. The results of our in-depth human-robot-interaction studies show that this robotic agent can serve as a model for creating the next generation of empathic social robots

    Shared Perception in Human-Robot Interaction

    Get PDF
    Interaction can be seen as a composition of perspectives: the integration of perceptions, intentions, and actions on the environment two or more agents share. For an interaction to be effective, each agent must be prone to “sharedness”: being situated in a common environment, able to read what others express about their perspective, and ready to adjust one’s own perspective accordingly. In this sense, effective interaction is supported by perceiving the environment jointly with others, a capability that in this research is called Shared Perception. Nonetheless, perception is a complex process that brings the observer receiving sensory inputs from the external world and interpreting them based on its own, previous experiences, predictions, and intentions. In addition, social interaction itself contributes to shaping what is perceived: others’ attention, perspective, actions, and internal states may also be incorporated into perception. Thus, Shared perception reflects the observer's ability to integrate these three sources of information: the environment, the self, and other agents. If Shared Perception is essential among humans, it is equally crucial for interaction with robots, which need social and cognitive abilities to interact with humans naturally and successfully. This research deals with Shared Perception within the context of Social Human-Robot Interaction (HRI) and involves an interdisciplinary approach. The two general axes of the thesis are the investigation of human perception while interacting with robots and the modeling of robot’s perception while interacting with humans. Such two directions are outlined through three specific Research Objectives, whose achievements represent the contribution of this work. i) The formulation of a theoretical framework of Shared Perception in HRI valid for interpreting and developing different socio-perceptual mechanisms and abilities. ii) The investigation of Shared Perception in humans focusing on the perceptual mechanism of Context Dependency, and therefore exploring how social interaction affects the use of previous experience in human spatial perception. iii) The implementation of a deep-learning model for Addressee Estimation to foster robots’ socio-perceptual skills through the awareness of others’ behavior, as suggested in the Shared Perception framework. To achieve the first Research Objective, several human socio-perceptual mechanisms are presented and interpreted in a unified account. This exposition parallels mechanisms elicited by interaction with humans and humanoid robots and aims to build a framework valid to investigate human perception in the context of HRI. Based on the thought of D. Davidson and conceived as the integration of information coming from the environment, the self, and other agents, the idea of "triangulation" expresses the critical dynamics of Shared Perception. Also, it is proposed as the functional structure to support the implementation of socio-perceptual skills in robots. This general framework serves as a reference to fulfill the other two Research Objectives, which explore specific aspects of Shared Perception. For what concerns the second Research Objective, the human perceptual mechanism of Context Dependency is investigated, for the first time, within social interaction. Human perception is based on unconscious inference, where sensory inputs integrate with prior information. This phenomenon helps in facing the uncertainty of the external world with predictions built upon previous experience. To investigate the effect of social interaction on such a mechanism, the iCub robot has been used as an experimental tool to create an interactive scenario with a controlled setting. A user study based on psychophysical methods, Bayesian modeling, and a neural network analysis of human results demonstrated that social interaction influenced Context Dependency so that when interacting with a social agent, humans rely less on their internal models and more on external stimuli. Such results are framed in Shared Perception and contribute to revealing the integration dynamics of the three sources of Shared Perception. The others’ presence and social behavior (other agents) affect the balance between sensory inputs (environment) and personal history (self) in favor of the information shared with others, that is, the environment. The third Research Objective consists of tackling the Addressee Estimation problem, i.e., understanding to whom a speaker is talking, to improve the iCub social behavior in multi-party interactions. Addressee Estimation can be considered a Shared Perception ability because it is achieved by using sensory information from the environment, internal representations of the agents’ position, and, more importantly, the understanding of others’ behavior. An architecture for Addressee Estimation is thus designed considering the integration process of Shared Perception (environment, self, other agents) and partially implemented with respect to the third element: the awareness of others’ behavior. To achieve this, a hybrid deep-learning (CNN+LSTM) model is developed to estimate the speaker-robot relative placement of the addressee based on the non-verbal behavior of the speaker. Addressee Estimation abilities based on Shared Perception dynamics are aimed at improving multi-party HRI. Making robots aware of other agents’ behavior towards the environment is the first crucial step for incorporating such information into the robot’s perception and modeling Shared Perception

    Stereotypical nationality representations in HRI: perspectives from international young adults

    Get PDF
    People often form immediate expectations about other people, or groups of people, based on visual appearance and characteristics of their voice and speech. These stereotypes, often inaccurate or overgeneralized, may translate to robots that carry human-like qualities. This study aims to explore if nationality-based preconceptions regarding appearance and accents can be found in people’s perception of a virtual and a physical social robot. In an online survey with 80 subjects evaluating different first-language-influenced accents of English and nationality-influenced human-like faces for a virtual robot, we find that accents, in particular, lead to preconceptions on perceived competence and likeability that correspond to previous findings in social science research. In a physical interaction study with 74 participants, we then studied if the perception of competence and likeability is similar after interacting with a robot portraying one of four different nationality representations from the online survey. We find that preconceptions on national stereotypes that appeared in the online survey vanish or are overshadowed by factors related to general interaction quality. We do, however, find some effects of the robot’s stereotypical alignment with the subject group, with Swedish subjects (the majority group in this study) rating the Swedish-accented robot as less competent than the international group, but, on the other hand, recalling more facts from the Swedish robot’s presentation than the international group does. In an extension in which the physical robot was replaced by a virtual robot interacting in the same scenario online, we further found the same results that preconceptions are of less importance after actual interactions, hence demonstrating that the differences in the ratings of the robot between the online survey and the interaction is not due to the interaction medium. We hence conclude that attitudes towards stereotypical national representations in HRI have a weak effect, at least for the user group included in this study (primarily educated young students in an international setting)

    Artificial Emotional Intelligence in Socially Assistive Robots

    Get PDF
    Artificial Emotional Intelligence (AEI) bridges the gap between humans and machines by demonstrating empathy and affection towards each other. This is achieved by evaluating the emotional state of human users, adapting the machine’s behavior to them, and hence giving an appropriate response to those emotions. AEI is part of a larger field of studies called Affective Computing. Affective computing is the integration of artificial intelligence, psychology, robotics, biometrics, and many more fields of study. The main component in AEI and affective computing is emotion, and how we can utilize emotion to create a more natural and productive relationship between humans and machines. An area in which AEI can be particularly beneficial is in building machines and robots for healthcare applications. Socially Assistive Robotics (SAR) is a subfield in robotics that aims at developing robots that can provide companionship to assist people with social interaction and companionship. For example, residents living in housing designed for older adults often feel lonely, isolated, and depressed; therefore, having social interaction and mental stimulation is critical to improve their well-being. Socially Assistive Robots are designed to address these needs by monitoring and improving the quality of life of patients with depression and dementia. Nevertheless, developing robots with AEI that understand users’ emotions and can reply to them naturally and effectively is in early infancy, and much more research needs to be carried out in this field. This dissertation presents the results of my work in developing a social robot, called Ryan, equipped with AEI for effective and engaging dialogue with older adults with depression and dementia. Over the course of this research there has been three versions of Ryan. Each new version of Ryan is created using the lessons learned after conducting the studies presented in this dissertation. First, two human-robot-interaction studies were conducted showing validity of using a rear-projected robot to convey emotion and intent. Then, the feasibility of using Ryan to interact with older adults is studied. This study investigated the possible improvement of the quality of life of older adults. Ryan the Companionbot used in this project is a rear-projected lifelike conversational robot. Ryan is equipped with many features such as games, music, video, reminders, and general conversation. Ryan engages users in cognitive games and reminiscence activities. A pilot study was conducted with six older adults with early-stage dementia and/or depression living in a senior living facility. Each individual had 24/7 access to a Ryan in his/her room for a period of 4-6 weeks. The observations of these individuals, interviews with them and their caregivers, and analysis of their interactions during this period revealed that they established rapport with the robot and greatly valued and enjoyed having a companionbot in their room. A multi-modal emotion recognition algorithm was developed as well as a multi-modal emotion expression system. These algorithms were then integrated into Ryan. To engage the subjects in a more empathic interaction with Ryan, a corpus of dialogues on different topics were created by English major students. An emotion recognition algorithm was designed and implemented and then integrated into the dialogue management system to empathize with users based on their perceived emotion. This study investigates the effects of this emotionally intelligent robot on older adults in the early stage of depression and dementia. The results of this study suggest that Ryan equipped with AEI is more engaging, likable, and attractive to users than Ryan without AEI. The long-term effect of the last version of Ryan (Ryan V3.0) was studied in a study involving 17 subjects from 5 different senior care facilities. The participants in this study experienced a general improvement in their cognitive and depression scores

    Proceedings of the LREC 2018 Special Speech Sessions

    Get PDF
    LREC 2018 Special Speech Sessions "Speech Resources Collection in Real-World Situations"; Phoenix Seagaia Conference Center, Miyazaki; 2018-05-0
    corecore