    Shared Perception in Human-Robot Interaction

    Interaction can be seen as a composition of perspectives: the integration of perceptions, intentions, and actions on the environment two or more agents share. For an interaction to be effective, each agent must be prone to “sharedness”: being situated in a common environment, able to read what others express about their perspective, and ready to adjust one’s own perspective accordingly. In this sense, effective interaction is supported by perceiving the environment jointly with others, a capability that in this research is called Shared Perception. Nonetheless, perception is a complex process that brings the observer receiving sensory inputs from the external world and interpreting them based on its own, previous experiences, predictions, and intentions. In addition, social interaction itself contributes to shaping what is perceived: others’ attention, perspective, actions, and internal states may also be incorporated into perception. Thus, Shared perception reflects the observer's ability to integrate these three sources of information: the environment, the self, and other agents. If Shared Perception is essential among humans, it is equally crucial for interaction with robots, which need social and cognitive abilities to interact with humans naturally and successfully. This research deals with Shared Perception within the context of Social Human-Robot Interaction (HRI) and involves an interdisciplinary approach. The two general axes of the thesis are the investigation of human perception while interacting with robots and the modeling of robot’s perception while interacting with humans. Such two directions are outlined through three specific Research Objectives, whose achievements represent the contribution of this work. i) The formulation of a theoretical framework of Shared Perception in HRI valid for interpreting and developing different socio-perceptual mechanisms and abilities. ii) The investigation of Shared Perception in humans focusing on the perceptual mechanism of Context Dependency, and therefore exploring how social interaction affects the use of previous experience in human spatial perception. iii) The implementation of a deep-learning model for Addressee Estimation to foster robots’ socio-perceptual skills through the awareness of others’ behavior, as suggested in the Shared Perception framework. To achieve the first Research Objective, several human socio-perceptual mechanisms are presented and interpreted in a unified account. This exposition parallels mechanisms elicited by interaction with humans and humanoid robots and aims to build a framework valid to investigate human perception in the context of HRI. Based on the thought of D. Davidson and conceived as the integration of information coming from the environment, the self, and other agents, the idea of "triangulation" expresses the critical dynamics of Shared Perception. Also, it is proposed as the functional structure to support the implementation of socio-perceptual skills in robots. This general framework serves as a reference to fulfill the other two Research Objectives, which explore specific aspects of Shared Perception. For what concerns the second Research Objective, the human perceptual mechanism of Context Dependency is investigated, for the first time, within social interaction. Human perception is based on unconscious inference, where sensory inputs integrate with prior information. This phenomenon helps in facing the uncertainty of the external world with predictions built upon previous experience. To investigate the effect of social interaction on such a mechanism, the iCub robot has been used as an experimental tool to create an interactive scenario with a controlled setting. A user study based on psychophysical methods, Bayesian modeling, and a neural network analysis of human results demonstrated that social interaction influenced Context Dependency so that when interacting with a social agent, humans rely less on their internal models and more on external stimuli. Such results are framed in Shared Perception and contribute to revealing the integration dynamics of the three sources of Shared Perception. The others’ presence and social behavior (other agents) affect the balance between sensory inputs (environment) and personal history (self) in favor of the information shared with others, that is, the environment. The third Research Objective consists of tackling the Addressee Estimation problem, i.e., understanding to whom a speaker is talking, to improve the iCub social behavior in multi-party interactions. Addressee Estimation can be considered a Shared Perception ability because it is achieved by using sensory information from the environment, internal representations of the agents’ position, and, more importantly, the understanding of others’ behavior. An architecture for Addressee Estimation is thus designed considering the integration process of Shared Perception (environment, self, other agents) and partially implemented with respect to the third element: the awareness of others’ behavior. To achieve this, a hybrid deep-learning (CNN+LSTM) model is developed to estimate the speaker-robot relative placement of the addressee based on the non-verbal behavior of the speaker. Addressee Estimation abilities based on Shared Perception dynamics are aimed at improving multi-party HRI. Making robots aware of other agents’ behavior towards the environment is the first crucial step for incorporating such information into the robot’s perception and modeling Shared Perception

    Conducting neuropsychological tests with a humanoid robot: design and evaluation

    International audience— Socially assistive robot with interactive behavioral capability have been improving quality of life for a wide range of users by taking care of elderlies, training individuals with cognitive disabilities or physical rehabilitation, etc. While the interactive behavioral policies of most systems are scripted, we discuss here key features of a new methodology that enables professional caregivers to teach a socially assistive robot (SAR) how to perform the assistive tasks while giving proper instructions, demonstrations and feedbacks. We describe here how socio-communicative gesture controllers – which actually control the speech, the facial displays and hand gestures of our iCub robot – are driven by multimodal events captured on a professional human demonstrator performing a neuropsychological interview. Furthermore, we propose an original online evaluation method for rating the multimodal interactive behaviors of the SAR and show how such a method can help designers to identify the faulty events

    Immersive Teleoperation of the Eye Gaze of Social Robots Assessing Gaze-Contingent Control of Vergence, Yaw and Pitch of Robotic Eyes

    International audienceThis paper presents a new teleoperation system – called stereo gaze-contingent steering (SGCS) – able to seamlessly control the vergence, yaw and pitch of the eyes of a humanoid robot – here an iCub robot – from the actual gaze direction of a remote pilot. The video stream captured by the cameras embedded in the mobile eyes of the iCub are fed into an HTC Vive R Head-Mounted Display equipped with an SMI R binocular eye-tracker. The SGCS achieves the effective coupling between the eye-tracked gaze of the pilot and the robot's eye movements. SGCS both ensures a faithful reproduction of the pilot's eye movements – that is perquisite for the readability of the robot's gaze patterns by its interlocutor – and maintains the pilot's oculomotor visual clues – that avoids fatigue and sickness due to sensorimotor conflicts. We here assess the precision of this servo-control by asking several pilots to gaze towards known objects positioned in the remote environment. We demonstrate that we succeed in controlling vergence with similar precision as eyes' azimuth and elevation. This system opens the way for robot-mediated human interactions in the personal space, notably when objects in the shared working space are involved

    Bringing Human Robot Interaction towards _Trust and Social Engineering

    Robots started their journey in books and movies; nowadays, they are becoming an important part of our daily lives: from industrial robots, passing through entertainment robots, and reaching social robotics in fields like healthcare or education. An important aspect of social robotics is the human counterpart, therefore, there is an interaction between the humans and robots. Interactions among humans are often taken for granted as, since children, we learn how to interact with each other. In robotics, this interaction is still very immature, however, critical for a successful incorporation of robots in society. Human robot interaction (HRI) is the domain that works on improving these interactions. HRI encloses many aspects, and a significant one is trust. Trust is the assumption that somebody or something is good and reliable; and it is critical for a developed society. Therefore, in a society where robots can part, the trust they could generate will be essential for cohabitation. A downside of trust is overtrusting an entity; in other words, an insufficient alignment of the projected trust and the expectations of a morally correct behaviour. This effect could negatively influence and damage the interactions between agents. In the case of humans, it is usually exploited by scammers, conmen or social engineers - who take advantage of the people's overtrust in order to manipulate them into performing actions that may not be beneficial for the victims. This thesis tries to shed light on the development of trust towards robots, how this trust could become overtrust and be exploited by social engineering techniques. More precisely, the following experiments have been carried out: (i) Treasure Hunt, in which the robot followed a social engineering framework where it gathered personal information from the participants, improved the trust and rapport with them, and at the end, it exploited that trust manipulating participants into performing a risky action. (ii) Wicked Professor, in which a very human-like robot tried to enforce its authority to make participants obey socially inappropriate requests. Most of the participants realized that the requests were morally wrong, but eventually, they succumbed to the robot'sauthority while holding the robot as morally responsible. (iii) Detective iCub, in which it was evaluated whether the robot could be endowed with the ability to detect when the human partner was lying. Deception detection is an essential skill for social engineers and professionals in the domain of education, healthcare and security. The robot achieved 75% of accuracy in the lie detection. There were also found slight differences in the behaviour exhibited by the participants when interacting with a human or a robot interrogator. Lastly, this thesis approaches the topic of privacy - a fundamental human value. With the integration of robotics and technology in our society, privacy will be affected in ways we are not used. Robots have sensors able to record and gather all kind of data, and it is possible that this information is transmitted via internet without the knowledge of the user. This is an important aspect to consider since a violation in privacy can heavily impact the trust. Summarizing, this thesis shows that robots are able to establish and improve trust during an interaction, to take advantage of overtrust and to misuse it by applying different types of social engineering techniques, such as manipulation and authority. Moreover, robots can be enabled to pick up different human cues to detect deception, which can help both, social engineers and professionals in the human sector. Nevertheless, it is of the utmost importance to make roboticists, programmers, entrepreneurs, lawyers, psychologists, and other sectors involved, aware that social robots can be highly beneficial for humans, but they could also be exploited for malicious purposes

    Towards adaptive and autonomous humanoid robots: from vision to actions

    Although robotics research has seen advances over the last decades robots are still not in widespread use outside industrial applications. Yet a range of proposed scenarios have robots working together, helping and coexisting with humans in daily life. In all these a clear need to deal with a more unstructured, changing environment arises. I herein present a system that aims to overcome the limitations of highly complex robotic systems, in terms of autonomy and adaptation. The main focus of research is to investigate the use of visual feedback for improving reaching and grasping capabilities of complex robots. To facilitate this a combined integration of computer vision and machine learning techniques is employed. From a robot vision point of view the combination of domain knowledge from both imaging processing and machine learning techniques, can expand the capabilities of robots. I present a novel framework called Cartesian Genetic Programming for Image Processing (CGP-IP). CGP-IP can be trained to detect objects in the incoming camera streams and successfully demonstrated on many different problem domains. The approach requires only a few training images (it was tested with 5 to 10 images per experiment) is fast, scalable and robust yet requires very small training sets. Additionally, it can generate human readable programs that can be further customized and tuned. While CGP-IP is a supervised-learning technique, I show an integration on the iCub, that allows for the autonomous learning of object detection and identification. Finally this dissertation includes two proof-of-concepts that integrate the motion and action sides. First, reactive reaching and grasping is shown. It allows the robot to avoid obstacles detected in the visual stream, while reaching for the intended target object. Furthermore the integration enables us to use the robot in non-static environments, i.e. the reaching is adapted on-the- fly from the visual feedback received, e.g. when an obstacle is moved into the trajectory. The second integration highlights the capabilities of these frameworks, by improving the visual detection by performing object manipulation actions

    Predicting extraversion from non-verbal features during a face-to-face human-robot interaction

    International audienceIn this paper we present a system for automatic prediction of extraversion during the first thin slices of human-robot interaction (HRI). This work is based on the hypothesis that personality traits and attitude towards robot appear in the behavioural response of humans during HRI. We propose a set of four non-verbal movement features that characterize human behavior during interaction. We focus our study on predicting Extraversion using these features , extracted from a dataset consisting of 39 healthy adults interacting with the humanoid iCub. Our analysis shows that it is possible to predict to a good level (64%) the Extraversion of a human from a thin slice of interaction relying only on non-verbal movement features. Our results are comparable to the state-of-the-art obtained in HHI [ 23 ]

    Cultural differences in speed adaptation in human-robot interaction tasks

    AbstractIn social interactions, human movement is a rich source of information for all those who take part in the collaboration. In fact, a variety of intuitive messages are communicated through motion and continuously inform the partners about the future unfolding of the actions. A similar exchange of implicit information could support movement coordination in the context of Human-Robot Interaction. In this work, we investigate how implicit signaling in an interaction with a humanoid robot can lead to emergent coordination in the form of automatic speed adaptation. In particular, we assess whether different cultures – specifically Japanese and Italian – have a different impact on motor resonance and synchronization in HRI. Japanese people show a higher general acceptance toward robots when compared with Western cultures. Since acceptance, or better affiliation, is tightly connected to imitation and mimicry, we hypothesize a higher degree of speed imitation for Japanese participants when compared to Italians. In the experimental studies undertaken both in Japan and Italy, we observe that cultural differences do not impact on the natural predisposition of subjects to adapt to the robot

    Learning What To Say And What To Do: A Model For Grounding Language And Actions

    Automation is becoming increasingly important in nowadays society, with robots performing a lot of repetitive tasks in industry and even entering our households in the form of vacuum cleaners and lawn mowers. When considering regular tasks outside of the controlled environments of industry, robots tend to perform poorly. In particular, in situations where robots have to interact with humans, a problem arises: how can a robot understand what the human means? While a lot of work has been made in the past towards visual perception and classification of objects, but understanding what action a verb translates into has still been an unexplored area. In solving this challenge, we would enable robots to execute commands given in natural language, and also to verbalise what actions they are performing when prompted. This work studies how a robot can learn the meaning behind the sentences humans use, how it translates into its perception and the real world, but also how to translate its actions into sentences humans understand. To achieve this we propose a novel Bidirectional machine learning model, along with a data collection module that can be used by non-technical users. The main idea behind this model is the ability to generalise to novel concepts, being able to compose new sentences and actions from what it learned previously. Humans show this ability to generalise from a young age, and it is a desirable feature for this model. By using humans natural teaching instincts to teach the robot together with this generalisation ability we hope to obtain a model that allows people everywhere to teach the robot to perform the actions we desire. We validate the model in a number of tasks, using an iCub and Pepper robots physically interacting with objects in order to complete a natural language command. We test different actions, including motor actions and emotional displays, while using both transitive and intransitive verbs in the natural language commands. The main contribution of this thesis is the development of a Bidirectional Learning Algorithm, applied to a Multiple Timescale Recurrent Neural Network enabling these models to link action and language in a bidirectional way. A second contribution sees the extension of Multiple Timescale architectures to Long Short-Term Memory models, increasing the capabilities of these models. Finally the third contribution is in the form of data collection modules, with the development of an easy-to-use module based on physical interaction and speech to provide the iCub and Pepper robots with the data to be learned
