19 research outputs found

    A Review of Verbal and Non-Verbal Human-Robot Interactive Communication

    Get PDF
    In this paper, an overview of human-robot interactive communication is presented, covering verbal as well as non-verbal aspects of human-robot interaction. Following a historical introduction, and motivation towards fluid human-robot communication, ten desiderata are proposed, which provide an organizational axis both of recent as well as of future research on human-robot communication. Then, the ten desiderata are examined in detail, culminating to a unifying discussion, and a forward-looking conclusion

    Integration of a voice recognition system in a social robot

    Get PDF
    Human-Robot Interaction (HRI) 1 is one of the main fields in the study and research of robotics. Within this field, dialog systems and interaction by voice play a very important role. When speaking about human- robot natural dialog we assume that the robot has the capability to accurately recognize the utterance what the human wants to transmit verbally and even its semantic meaning, but this is not always achieved. In this paper we describe the steps and requirements that we went through in order to endow the personal social robot Maggie, developed in the University Carlos III of Madrid, with the capability of understanding the natural language spoken by any human. We have analyzed the different possibilities offered by current software/hardware alternatives by testing them in real environments. We have obtained accurate data related to the speech recognition capabilities in different environments, using the most modern audio acquisition systems and analyzing not so typical parameters as user age, sex, intonation, volume and language. Finally we propose a new model to classify recognition results as accepted and rejected, based in a second ASR opinion. This new approach takes into account the pre-calculated success rate in noise intervals for each recognition framework decreasing false positives and false negatives rate.The funds have provided by the Spanish Government through the project called `Peer to Peer Robot-Human Interaction'' (R2H), of MEC (Ministry of Science and Education), and the project “A new approach to social robotics'' (AROS), of MICINN (Ministry of Science and Innovation). The research leading to these results has received funding from the RoboCity2030-II-CM project (S2009/DPI-1559), funded by Programas de Actividades I+D en la Comunidad de Madrid and cofunded by Structural Funds of the EU

    Desenvolvimento de um sistema de diálogo para interação com robôs

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaService robots operate in the same environment as humans and perform actions that a human usually performs. These robots must be able to operate autonomously in unknown and dynamic environments, as well as to maneuver with several people and know how to deal with them. By complying with these requirements, they are able to successfully address humans and fulfill their requests whenever they need assistance in a certain task. Natural language communication, including speech that is the most natural way of communication between humans, becomes relevant in the field of Human-Robot Interaction (HRI). By endowing service robots with intuitive spoken interfaces, the specification of the human required tasks is facilitated. However, this is a complicated task to achieve due to the resources involved in creating a sufficiently intuitive spoken interface and because of the difficulty of deploying it in different robots. The main objective of this thesis is the definition, implementation and evaluation of a dialogue system that can be easily integrated into any robotic platform and that functions as a flexible base for the creation of any conversational scenario in the Portuguese language. The system must meet the basic requirements for intuitive and natural communications, namely the characteristics of human-human conversations. A system was developed that functions as a base to give continuity to future work on Spoken Dialog Systems. The system incorporates the client-server architecture, where the client runs on the robot and captures what the user says. The client takes advantage on external dialogue management services. They are executed by the server, which processes the audio obtained, returning an appropriate response given the context of the dialogue. The development was based on a critical analysis of the state of the art in order for the system to be as faithful as possible to what is already done. Through the evaluation phase of the system, it was managed to obtain by few volunteers the conclusion that the main objective was accomplished: a base system was created that is flexible enough to explore different contexts of conversation, such as interacting with children or providing information on a university environment.Os robôs de serviço operam no mesmo ambiente dos humanos e executam ações que um humano normalmente executaria. Estes robôs devem ser capazes de operar de forma autónoma em ambientes desconhecidos e dinâmicos, assim como de manobrar em ambientes com várias pessoas e de saberem lidar com elas. Ao respeitarem estes requisitos, conseguirão abordar com sucesso os humanos e cumprir as suas solicitações sempre que estes precisem de assistência em alguma tarefa. A comunicação por linguagem natural, nomeadamente a fala que é a forma mais abrangente de comunicação entre humanos, torna-se relevante na área da Interação humano-robô (IHR). Ao dotar os robôs de serviço com sistemas de voz intuitivos facilita-se a especificação das tarefas a realizar. No entanto, é uma tarefa complicada de se realizar devido aos recursos envolvidos na criação de uma interação suficientemente intuitiva e devido à dificuldade de funcionar em diversos robôs. O objetivo principal deste trabalho é a definição, implementação e avaliação de um sistema de diálogo que seja de fácil integração em qualquer sistema robótico e que funcione como uma base flexível para qualquer cenário de conversação na língua Portuguesa. Deve obedecer a requisitos base de comunicação intuitiva e natural, nomeadamente a características de conversas entre humanos. Foi desenvolvido um sistema que funciona como uma base para dar continuidade a trabalho futuro em sistemas de diálogo. O sistema incorpora a arquitetura cliente-servidor onde o cliente é executado no robô e capta o que o utilizador diz. O cliente tira partido de serviços de gestão de diálogo externos ao robô, executados pelo servidor, que processa o áudio obtido, devolvendo uma resposta ao cliente adequada ao contexto do diálogo. O desenvolvimento foi baseado numa análise crítica do estado da arte para se tentar manter fiel ao que já foi feito e de forma a se tomarem as principais decisões durante a implementação. Mediante a fase de avaliação do sistema, tanto a nível do ponto de vista da interação como do programador, conseguiu-se obter por parte de alguns voluntários que o objetivo principal foi cumprido: foi criada uma base suficientemente flexível para explorar diferentes contextos de conversação, nomeadamente interagir com crianças ou fornecimento de informações em ambiente universitário

    Voice Enabled Interface for Interactive Tour Guide Robots

    Get PDF
    This paper considers design methodologies in order to develop voice-enabled interfaces for tour-guide robots to be deployed at the Robotics Exposition of the Swiss National Exhibition (Expo.02). Human-robot voice communication presents new challenges for design of fully autonomous mobile robots, in that interactivity must be robot-initiated in conversation and within a dynamic adverse environment. We approached these general problems for a voice enabled interface, tailored to limited computational resources of one on-board processor, when integrating smart speech signal acquisition, automatic speech recognition and synthesis, as well as dialogue system into the multi-modal, multi-sensor interface for the expo tour-guide robot. We also focus on particular issues that need to be addressed in voice-based interaction when planning specific tasks and research experiments for Expo.02 where tour-guide robots will interact with hundred of thousands of visitors during six months, seven days a week, ten hours per day

    A Software Testbed for Assessing Human-Robot Verbal Interaction

    Get PDF
    Verbal interaction provides a natural and social-style interaction mode by which robots can communicate with general public who is likely unknowledgeable in robotics. This interaction mechanism is also very important for a broad range of users such as hands/eyes-busy users, motor-impaired users, users with vision impairment and users working in hostile environments. Verbal interaction is very popular in robotics especially in personal assistive robots, which are used to help elderly people and in entertainment robots. Several research endeavors have been assigned to endow the robots with verbal interaction as a high-level faculty. However, the language usages of many of them were simple and may not be considered as full speech dialogue systems providing natural language understanding. In this thesis, we investigate a testbed platform that can be deployed to enable human-robot verbal interaction. The proposed approach encompasses a design pattern-based user interface and a user-independent automatic speech recognizer with a modified grammar module in the context of human-robot interaction. The user interface is used to simulate robots response toward multiple users’ voice commands. The performance of the proposed testbed has been evaluated quantitatively using a set of evaluation metrics such as word correct rate, recognition time and success and false action rates. The conducted experiments show the promising features of the system. The results obtained could be refined even further by training the system for more voice commands and the whole system could be ported to real robotic platforms such as Peoplebot to endow it with natural language understanding

    VISION-BASED URBAN NAVIGATION PROCEDURES FOR VERBALLY INSTRUCTED ROBOTS

    Get PDF
    The work presented in this thesis is part of a project in instruction based learning (IBL) for mobile robots were a robot is designed that can be instructed by its users through unconstrained natural language. The robot uses vision guidance to follow route instructions in a miniature town model. The aim of the work presented here was to determine the functional vocabulary of the robot in the form of "primitive procedures". In contrast to previous work in the field of instructable robots this was done following a "user-centred" approach were the main concern was to create primitive procedures that can be directly associated with natural language instructions. To achieve this, a corpus of human-to-human natural language instructions was collected and analysed. A set of primitive actions was found with which the collected corpus could be represented. These primitive actions were then implemented as robot-executable procedures. Natural language instructions are under-specified when destined to be executed by a robot. This is because instructors omit information that they consider as "commonsense" and rely on the listener's sensory-motor capabilities to determine the details of the task execution. In this thesis the under-specification problem is solved by determining the missing information, either during the learning of new routes or during their execution by the robot. During learning, the missing information is determined by imitating the commonsense approach human listeners take to achieve the same purpose. During execution, missing information, such as the location of road layout features mentioned in route instructions, is determined from the robot's view by using image template matching. The original contribution of this thesis, in both these methods, lies in the fact that they are driven by the natural language examples found in the corpus collected for the IDL project. During the testing phase a high success rate of primitive calls, when these were considered individually, showed that the under-specification problem has overall been solved. A novel method for testing the primitive procedures, as part of complete route descriptions, is also proposed in this thesis. This was done by comparing the performance of human subjects when driving the robot, following route descriptions, with the performance of the robot when executing the same route descriptions. The results obtained from this comparison clearly indicated where errors occur from the time when a human speaker gives a route description to the time when the task is executed by a human listener or by the robot. Finally, a software speed controller is proposed in this thesis in order to control the wheel speeds of the robot used in this project. The controller employs PI (Proportional and Integral) and PID (Proportional, Integral and Differential) control and provides a good alternative to expensive hardware

    Error handling in multimodal voice-enabled interfaces of tour-guide robots using graphical models

    Get PDF
    Mobile service robots are going to play an increasing role in the society of humans. Voice-enabled interaction with service robots becomes very important, if such robots are to be deployed in real-world environments and accepted by the vast majority of potential human users. The research presented in this thesis addresses the problem of speech recognition integration in an interactive voice-enabled interface of a service robot, in particular a tour-guide robot. The task of a tour-guide robot is to engage visitors to mass exhibitions (users) in dialogue providing the services it is designed for (e.g. exhibit presentations) within a limited time. In managing tour-guide dialogues, extracting the user goal (intention) for requesting a particular service at each dialogue state is the key issue. In mass exhibition conditions speech recognition errors are inevitable because of noisy speech and uncooperative users of robots with no prior experience in robotics. They can jeopardize the user goal identification. Wrongly identified user goals can lead to communication failures. Therefore, to reduce the risk of such failures, methods for detecting and compensating for communication failures in human-robot dialogue are needed. During the short-term interaction with visitors, the interpretation of the user goal at each dialogue state can be improved by combining speech recognition in the speech modality with information from other available robot modalities. The methods presented in this thesis exploit probabilistic models for fusing information from speech and auxiliary modalities of the robot for user goal identification and communication failure detection. To compensate for the detected communication failures we investigate multimodal methods for recovery from communication failures. To model the process of modality fusion, taking into account the uncertainties in the information extracted from each input modality during human-robot interaction, we use the probabilistic framework of Bayesian networks. Bayesian networks are graphical models that represent a joint probability function over a set of random variables. They are used to model the dependencies among variables associated with the user goals, modality related events (e.g. the event of user presence that is inferred from the laser scanner modality of the robot), and observed modality features providing evidence in favor of these modality events. Bayesian networks are used to calculate posterior probabilities over the possible user goals at each dialogue state. These probabilities serve as a base in deciding if the user goal is valid, i.e. if it can be mapped into a tour-guide service (e.g. exhibit presentation) or is undefined – signaling a possible communication failure. The Bayesian network can be also used to elicit probabilities over the modality events revealing information about the possible cause for a communication failure. Introducing new user goal aspects (e.g. new modality events and related features) that provide auxiliary information for detecting communication failures makes the design process cumbersome, calling for a systematic approach in the Bayesian network modelling. Generally, introducing new variables for user goal identification in the Bayesian networks can lead to complex and computationally expensive models. In order to make the design process more systematic and modular, we adapt principles from the theory of grounding in human communication. When people communicate, they resolve understanding problems in a collaborative joint effort of providing evidence of common shared knowledge (grounding). We use Bayesian network topologies, tailored to limited computational resources, to model a state-based grounding model fusing information from three different input modalities (laser, video and speech) to infer possible grounding states. These grounding states are associated with modality events showing if the user is present in range for communication, if the user is attending to the interaction, whether the speech modality is reliable, and if the user goal is valid. The state-based grounding model is used to compute probabilities that intermediary grounding states have been reached. This serves as a base for detecting if the the user has reached the final grounding state, or wether a repair dialogue sequence is needed. In the case of a repair dialogue sequence, the tour-guide robot can exploit the multiple available modalities along with speech. For example, if the user has failed to reach the grounding state related to her/his presence in range for communication, the robot can use its move modality to search and attract the attention of the visitors. In the case when speech recognition is detected to be unreliable, the robot can offer the alternative use of the buttons modality in the repair sequence. Given the probability of each grounding state, and the dialogue sequence that can be executed in the next dialogue state, a tour-guide robot has different preferences on the possible dialogue continuation. If the possible dialogue sequences at each dialogue state are defined as actions, the introduced principle of maximum expected utility (MEU) provides an explicit way of action selection, based on the action utility, given the evidence about the user goal at each dialogue state. Decision networks, constructed as graphical models based on Bayesian networks are proposed to perform MEU-based decisions, incorporating the utility of the actions to be chosen at each dialogue state by the tour-guide robot. These action utilities are defined taking into account the tour-guide task requirements. The proposed graphical models for user goal identification and dialogue error handling in human-robot dialogue are evaluated in experiments with multimodal data. These data were collected during the operation of the tour-guide robot RoboX at the Autonomous System Lab of EPFL and at the Swiss National Exhibition in 2002 (Expo.02). The evaluation experiments use component and system level metrics for technical (objective) and user-based (subjective) evaluation. On the component level, the technical evaluation is done by calculating accuracies, as objective measures of the performance of the grounding model, and the resulting performance of the user goal identification in dialogue. The benefit of the proposed error handling framework is demonstrated comparing the accuracy of a baseline interactive system, employing only speech recognition for user goal identification, and a system equipped with multimodal grounding models for error handling

    MULTI-MODAL TASK INSTRUCTIONS TO ROBOTS BY NAIVE USERS

    Get PDF
    This thesis presents a theoretical framework for the design of user-programmable robots. The objective of the work is to investigate multi-modal unconstrained natural instructions given to robots in order to design a learning robot. A corpus-centred approach is used to design an agent that can reason, learn and interact with a human in a natural unconstrained way. The corpus-centred design approach is formalised and developed in detail. It requires the developer to record a human during interaction and analyse the recordings to find instruction primitives. These are then implemented into a robot. The focus of this work has been on how to combine speech and gesture using rules extracted from the analysis of a corpus. A multi-modal integration algorithm is presented, that can use timing and semantics to group, match and unify gesture and language. The algorithm always achieves correct pairings on a corpus and initiates questions to the user in ambiguous cases or missing information. The domain of card games has been investigated, because of its variety of games which are rich in rules and contain sequences. A further focus of the work is on the translation of rule-based instructions. Most multi-modal interfaces to date have only considered sequential instructions. The combination of frame-based reasoning, a knowledge base organised as an ontology and a problem solver engine is used to store these rules. The understanding of rule instructions, which contain conditional and imaginary situations require an agent with complex reasoning capabilities. A test system of the agent implementation is also described. Tests to confirm the implementation by playing back the corpus are presented. Furthermore, deployment test results with the implemented agent and human subjects are presented and discussed. The tests showed that the rate of errors that are due to the sentences not being defined in the grammar does not decrease by an acceptable rate when new grammar is introduced. This was particularly the case for complex verbal rule instructions which have a large variety of being expressed

    Modeling Human-Robot-Interaction based on generic Interaction Patterns

    Get PDF
    Peltason J. Modeling Human-Robot-Interaction based on generic Interaction Patterns. Bielefeld: Bielefeld University; 2014
    corecore