6 research outputs found

    Multimodal Interaction Management for Tour-Guide Robots Using Bayesian Networks

    Get PDF
    In this paper, we propose a Bayesian network framework for managing interactivity between a tour-guide robot and visitors in mass exhibition conditions, through robust interpretation of multi-modal signals. We report on methods and experiments interpreting speech and laser scanner signals in the spoken dialogue management system of the autonomous tour-guide robot RoboX, successfully deployed at the Swiss National Exhibition (Expo.02). A correct interpretation of a users (visitors) goal or intention at each dialogue state is a key issue for successful speech-based interaction in voice-enabled communication between robots and visitors. We introduce a Bayesian network approach for combining noisy speech recognition results with noise-independent data from a laser scanner, in order to infer the visitors goal under the uncertainty intrinsic to these two modalities. We demonstrate the effectiveness of the approach by simulation based on real observations during experiments with the tour-guide robot RoboX at Expo.02

    Development of a Voice-Controlled Human-Robot Interface

    Get PDF
    The goal of this thesis is to develop a voice-controlled human-robot interface (HRI) which allows a person to control and communicate with a robot. Dragon NaturallySpeaking, a commercially available automatic speech recognition engine, was chosen for the development of the proposed HRI. In order to achieve the goal, the Dragon software is used to create custom commands (or macros) which must satisfy the tasks of (a) directly controlling the robot with voice, (b) writing a robot program with voice, and (c) developing a HRI which allows the human and robot to communicate with each other using speech. The key is to generate keystrokes upon recognizing the speech and three types of macro including step-by-step, macro recorder, and advanced scripting. Experiment was conducted in three phases to test the functionality of the developed macros in accomplishing all three tasks. The result showed that advanced scripting macro is the only type of macro that works. It is also the most suitable for the task because it is quick and easy to create and can be used to develop flexible and natural voice command. Since the output of macro is a series of keystrokes, which forms a syntax for the robot program, macros developed by the Dragon software can be used to communicate with virtually any robots by making an adjustment on the output keystroke

    On developing a voice-enabled interface for interactive tour-guide robots

    No full text
    This paper considers design methodologies in order to develop voice-enabled interfaces for tour-guide robots deployed at the Robotics Exposition of the Swiss National Exhibition (Expo.02). Human–robot voice communication presents new challenges for design of fully autonomous mobile robots, in that interactivity must be robot-initiated in conversation and within a dynamic adverse environment.We approached these general problems for a voice-enabledinterface, tailored to limited computational resources of one on-board processor,when integrating smart speech signal acquisition, automatic speech recognition and synthesis, as well as a dialogue system into the multi-modal, multi-sensor interface for the expo tour-guide robot. We also focus on particular issues that needed to be addressed in voice-based interaction when planning specific tasks and research experiments for Expo.02, where tour-guide robots had to interact with hundreds of thousands of visitors over 6 months, 7 days a week, 10 h per day

    Is a robot an appliance, teammate, or friend? age-related differences in expectations of and attitudes toward personal home-based robots

    Get PDF
    Future advances in technology may allow home-based robots to perform complex collaborative activities with individuals of different ages. Two studies were conducted to understand the expectations of and attitudes toward home-based robots by younger and older adults. One study involved questionnaires sent to 2500 younger adults (aged 18-28) and 2500 older adults (aged 65-86) in the Atlanta Metropolitan area. One hundred and eighty questionnaires were completed and returned by individuals in the targeted age groups. For the questionnaire, participants were asked to imagine a robot in their home and then to answer questions about how well characteristics matched their imagined robot. Participants' technology and robot experience, demographic information, and health information were also collected. In conjunction with the questionnaire study, twelve younger adults (aged 19-26) and twenty-four older adults in two sub-age groups (younger-older, aged 65-75, and older-older aged 77-85) were interviewed about their expectations of and attitudes toward a robot in their home. They were asked to imagine a robot in their home and answer numerous questions about the tasks their envisioned robot would perform, the appearance of the robot, and other general questions about their interaction with the robot. The results of the studies suggest that individuals have many different ideas about what a robot in the home would be like. Mostly, they want a robot to perform mundane or repetitive tasks, such as cleaning, and picture a robot as a time-saving device. However, individuals are willing to have a robot perform other types of tasks, if they see benefits of having the robot perform those tasks. The ability of the robot to perform tasks efficiently, with minimal effort on the part of the human, appears to be more important in determining acceptance of the robot than its social ability or appearance. Overall, individuals both younger and older seem to be very open to the idea of a robot in their home as long it is useful and not too difficult to use.Ph.D.Committee Chair: Fisk, Arthur D.; Committee Member: Corso, Gregory; Committee Member: Essa, Irfan A.; Committee Member: Roberts, James S.; Committee Member: Rogers, Wendy A.; Committee Member: Van Ittersum, Koert

    Semantische Objektmodellierung mittels multimodaler Interaktion

    Get PDF
    Ein Konzept für eine interaktive semantische Objektmodellierung wird vorgeschlagen. Die flexible und erweiterbare Objektrepräsentation ermöglicht die Modellierung funktionaler und semantischer Objektinformationen durch die Darstellung von Eigenschaften, die menschliche Begriffe und Kategorien abbilden und die Verbindung von Objekten mit Handlungen und mit sensoriell erfassbaren Attributen herstellen. Das interaktive Modellierungssystem erlaubt die intuitive Erstellung semantischer Objektmodelle

    Error handling in multimodal voice-enabled interfaces of tour-guide robots using graphical models

    Get PDF
    Mobile service robots are going to play an increasing role in the society of humans. Voice-enabled interaction with service robots becomes very important, if such robots are to be deployed in real-world environments and accepted by the vast majority of potential human users. The research presented in this thesis addresses the problem of speech recognition integration in an interactive voice-enabled interface of a service robot, in particular a tour-guide robot. The task of a tour-guide robot is to engage visitors to mass exhibitions (users) in dialogue providing the services it is designed for (e.g. exhibit presentations) within a limited time. In managing tour-guide dialogues, extracting the user goal (intention) for requesting a particular service at each dialogue state is the key issue. In mass exhibition conditions speech recognition errors are inevitable because of noisy speech and uncooperative users of robots with no prior experience in robotics. They can jeopardize the user goal identification. Wrongly identified user goals can lead to communication failures. Therefore, to reduce the risk of such failures, methods for detecting and compensating for communication failures in human-robot dialogue are needed. During the short-term interaction with visitors, the interpretation of the user goal at each dialogue state can be improved by combining speech recognition in the speech modality with information from other available robot modalities. The methods presented in this thesis exploit probabilistic models for fusing information from speech and auxiliary modalities of the robot for user goal identification and communication failure detection. To compensate for the detected communication failures we investigate multimodal methods for recovery from communication failures. To model the process of modality fusion, taking into account the uncertainties in the information extracted from each input modality during human-robot interaction, we use the probabilistic framework of Bayesian networks. Bayesian networks are graphical models that represent a joint probability function over a set of random variables. They are used to model the dependencies among variables associated with the user goals, modality related events (e.g. the event of user presence that is inferred from the laser scanner modality of the robot), and observed modality features providing evidence in favor of these modality events. Bayesian networks are used to calculate posterior probabilities over the possible user goals at each dialogue state. These probabilities serve as a base in deciding if the user goal is valid, i.e. if it can be mapped into a tour-guide service (e.g. exhibit presentation) or is undefined – signaling a possible communication failure. The Bayesian network can be also used to elicit probabilities over the modality events revealing information about the possible cause for a communication failure. Introducing new user goal aspects (e.g. new modality events and related features) that provide auxiliary information for detecting communication failures makes the design process cumbersome, calling for a systematic approach in the Bayesian network modelling. Generally, introducing new variables for user goal identification in the Bayesian networks can lead to complex and computationally expensive models. In order to make the design process more systematic and modular, we adapt principles from the theory of grounding in human communication. When people communicate, they resolve understanding problems in a collaborative joint effort of providing evidence of common shared knowledge (grounding). We use Bayesian network topologies, tailored to limited computational resources, to model a state-based grounding model fusing information from three different input modalities (laser, video and speech) to infer possible grounding states. These grounding states are associated with modality events showing if the user is present in range for communication, if the user is attending to the interaction, whether the speech modality is reliable, and if the user goal is valid. The state-based grounding model is used to compute probabilities that intermediary grounding states have been reached. This serves as a base for detecting if the the user has reached the final grounding state, or wether a repair dialogue sequence is needed. In the case of a repair dialogue sequence, the tour-guide robot can exploit the multiple available modalities along with speech. For example, if the user has failed to reach the grounding state related to her/his presence in range for communication, the robot can use its move modality to search and attract the attention of the visitors. In the case when speech recognition is detected to be unreliable, the robot can offer the alternative use of the buttons modality in the repair sequence. Given the probability of each grounding state, and the dialogue sequence that can be executed in the next dialogue state, a tour-guide robot has different preferences on the possible dialogue continuation. If the possible dialogue sequences at each dialogue state are defined as actions, the introduced principle of maximum expected utility (MEU) provides an explicit way of action selection, based on the action utility, given the evidence about the user goal at each dialogue state. Decision networks, constructed as graphical models based on Bayesian networks are proposed to perform MEU-based decisions, incorporating the utility of the actions to be chosen at each dialogue state by the tour-guide robot. These action utilities are defined taking into account the tour-guide task requirements. The proposed graphical models for user goal identification and dialogue error handling in human-robot dialogue are evaluated in experiments with multimodal data. These data were collected during the operation of the tour-guide robot RoboX at the Autonomous System Lab of EPFL and at the Swiss National Exhibition in 2002 (Expo.02). The evaluation experiments use component and system level metrics for technical (objective) and user-based (subjective) evaluation. On the component level, the technical evaluation is done by calculating accuracies, as objective measures of the performance of the grounding model, and the resulting performance of the user goal identification in dialogue. The benefit of the proposed error handling framework is demonstrated comparing the accuracy of a baseline interactive system, employing only speech recognition for user goal identification, and a system equipped with multimodal grounding models for error handling
    corecore