37 research outputs found

    Impact of Iris Size and Eyelids Coupling on the Estimation of the Gaze Direction of a Robotic Talking Head by Human Viewers

    No full text
    International audiencePrimates - and in particular humans-are very sensitive to the eye direction of congeners. Estimation of gaze of others is one of the basic skills for estimating goals, intentions and desires of social agents, whether they are humans or avatars. When building robots, one should not only supply them with gaze trackers but also check for the readability of their own gaze by human partners. We conducted experiments that demonstrate the strong impact of the iris size and the position of the eyelids of an iCub humanoid robot on gaze reading performance by human observers. We comment on the importance of assessing the robot's ability of displaying its intentions via clearly legible and readable gestures

    Conversational AI and Knowledge Graphs for Social Robot Interaction

    Get PDF
    The paper describes an approach that combines work from three fields with previously separate research commu-nities: social robotics, conversational AI, and graph databases. The aim is to develop a generic framework in which a variety of social robots can provide high-quality information to users by accessing semantically-rich knowledge graphs about multiple different domains. An example implementation uses a Furhat robot with Rasa open source conversational AI and knowledge graphs in Neo4j graph databases.Peer reviewe

    Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

    Full text link
    This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System

    ICMI'12:Proceedings of the ACM SIGCHI 14th International Conference on Multimodal Interaction

    Get PDF

    A multimodal multiparty human-robot dialogue corpus for real world interaction

    Get PDF
    Kyoto University/Honda Research Institute Japan Co.,Ltd.LREC 2018 Special Speech Sessions "Speech Resources Collection in Real-World Situations"; Phoenix Seagaia Conference Center, Miyazaki; 2018-05-09We have developed the MPR multimodal dialogue corpus and describe research activities using the corpus aimed for enabling multiparty human-robot verbal communication in real-world settings. While aiming for that as the final goal, the immediate focus of our project and the corpus is non-verbal communication, especially social signal processing by machines as the foundation of human-machine verbal communication. The MPR corpus stores annotated audio-visual recordings of dialogues between one robot and one or multiple (up to tree) participants. The annotations include speech segment, addressee of speech, transcript, interaction state, and, dialogue act types. Our research on multiparty dialogue management, boredom recognition, response obligation recognition, surprise detection and repair detection using the corpus is briefly introduced, and an analysis on repair in multiuser situations is presented. It exhibits richer repair behaviors and demands more sophisticated repair handling by machines

    Evaluation of artificial mouths in social robots

    Get PDF
    The external aspects of a robot affect how people behave and perceive it while interacting. In this paper, we study the importance of the mouth displayed by a social robot and explore how different designs of an artificial LED-based mouths alter the participants' judgments of a robot's attributes and their attention to the robot's message. We evaluated participants' judgments of a speaking robot under four conditions: 1) without a mouth; 2) with a static smile; 3) with a vibrating, wave-shaped mouth; and 4) with a moving, human-like mouth. A total of 79 participants evaluated their perceptions of an on-video robot showing one of the four conditions. The results show that the presence of a mouth, as well as its design, alters the perception of the robot. In particular, the presence of a mouth makes the robot to be perceived more lifelike and less sad. The human-like mouth was the one participants liked the most and, along with the smile, they were the friendliest ones. On the contrary, participants rated the mouthless robot and the one with the wave-like mouth as the most dangerous ones.Ministerio de Economia y Competitividad (DPI2014-57684-R); in part by the MOnarCH, funded by the European Commission (Grant Agreement 601033); and in part by the RoboCity2030-III-CM, funded by the Comunidad de Madrid and cofunded by the Structural Funds of the EU (S2013/MIT-2748)

    Building Embodied Conversational Agents:Observations on human nonverbal behaviour as a resource for the development of artificial characters

    Get PDF
    "Wow this is so cool!" This is what I most probably yelled, back in the 90s, when my first computer program on our MSX computer turned out to do exactly what I wanted it to do. The program contained the following instruction: COLOR 10(1.1) After hitting enter, it would change the screen color from light blue to dark yellow. A few years after that experience, Microsoft Windows was introduced. Windows came with an intuitive graphical user interface that was designed to allow all people, so also those who would not consider themselves to be experienced computer addicts, to interact with the computer. This was a major step forward in human-computer interaction, as from that point forward no complex programming skills were required anymore to perform such actions as adapting the screen color. Changing the background was just a matter of pointing the mouse to the desired color on a color palette. "Wow this is so cool!". This is what I shouted, again, 20 years later. This time my new smartphone successfully skipped to the next song on Spotify because I literally told my smartphone, with my voice, to do so. Being able to operate your smartphone with natural language through voice-control can be extremely handy, for instance when listening to music while showering. Again, the option to handle a computer with voice instructions turned out to be a significant optimization in human-computer interaction. From now on, computers could be instructed without the use of a screen, mouse or keyboard, and instead could operate successfully simply by telling the machine what to do. In other words, I have personally witnessed how, within only a few decades, the way people interact with computers has changed drastically, starting as a rather technical and abstract enterprise to becoming something that was both natural and intuitive, and did not require any advanced computer background. Accordingly, while computers used to be machines that could only be operated by technically-oriented individuals, they had gradually changed into devices that are part of many people’s household, just as much as a television, a vacuum cleaner or a microwave oven. The introduction of voice control is a significant feature of the newer generation of interfaces in the sense that these have become more "antropomorphic" and try to mimic the way people interact in daily life, where indeed the voice is a universally used device that humans exploit in their exchanges with others. The question then arises whether it would be possible to go even one step further, where people, like in science-fiction movies, interact with avatars or humanoid robots, whereby users can have a proper conversation with a computer-simulated human that is indistinguishable from a real human. An interaction with a human-like representation of a computer that behaves, talks and reacts like a real person would imply that the computer is able to not only produce and understand messages transmitted auditorily through the voice, but also could rely on the perception and generation of different forms of body language, such as facial expressions, gestures or body posture. At the time of writing, developments of this next step in human-computer interaction are in full swing, but the type of such interactions is still rather constrained when compared to the way humans have their exchanges with other humans. It is interesting to reflect on how such future humanmachine interactions may look like. When we consider other products that have been created in history, it sometimes is striking to see that some of these have been inspired by things that can be observed in our environment, yet at the same do not have to be exact copies of those phenomena. For instance, an airplane has wings just as birds, yet the wings of an airplane do not make those typical movements a bird would produce to fly. Moreover, an airplane has wheels, whereas a bird has legs. At the same time, an airplane has made it possible for a humans to cover long distances in a fast and smooth manner in a way that was unthinkable before it was invented. The example of the airplane shows how new technologies can have "unnatural" properties, but can nonetheless be very beneficial and impactful for human beings. This dissertation centers on this practical question of how virtual humans can be programmed to act more human-like. The four studies presented in this dissertation all have the equivalent underlying question of how parts of human behavior can be captured, such that computers can use it to become more human-like. Each study differs in method, perspective and specific questions, but they are all aimed to gain insights and directions that would help further push the computer developments of human-like behavior and investigate (the simulation of) human conversational behavior. The rest of this introductory chapter gives a general overview of virtual humans (also known as embodied conversational agents), their potential uses and the engineering challenges, followed by an overview of the four studies
    corecore