388 research outputs found

    Bayesian networks for spoken dialogue management in multimodal systems of tour-guide robots

    Get PDF
    In this paper, we propose a method based on Bayesian networks for interpretation of multimodal signals used in the spoken dialogue between a tour-guide robot and visitors in mass exhibition conditions. We report on experiments interpreting speech and laser scanner signals in the dialogue management system of the autonomous tour-guide robot RoboX, successfully deployed at the Swiss National Exhibition (Expo.02). A correct interpretation of a users (visitors) goal or intention at each dialogue state is a key issue for successful voice-enabled communication between tour-guide robots and visitors. To infer the visitors goal under the uncertainty intrinsic to these two modalities, we introduce Bayesian networks for combining noisy speech recognition with data from a laser scanner, which is independent of acoustic noise. Experiments with real data, collected during the operation of RoboX at Expo.02 demonstrate the effectiveness of the approach

    Low-level grounding in a multimodal mobile service robot conversational system using graphical models

    Get PDF
    The main task of a service robot with a voice-enabled communication interface is to engage a user in dialogue providing an access to the services it is designed for. In managing such interaction, inferring the user goal (intention) from the request for a service at each dialogue turn is the key issue. In service robot deployment conditions speech recognition limitations with noisy speech input and inexperienced users may jeopardize user goal identification. In this paper, we introduce a grounding state-based model motivated by reducing the risk of communication failure due to incorrect user goal identification. The model exploits the multiple modalities available in the service robot system to provide evidence for reaching grounding states. In order to handle the speech input as sufficiently grounded (correctly understood) by the robot, four proposed states have to be reached. Bayesian networks combining speech and non-speech modalities during user goal identification are used to estimate probability that each grounding state has been reached. These probabilities serve as a base for detecting whether the user is attending to the conversation, as well as for deciding on an alternative input modality (e.g., buttons) when the speech modality is unreliable. The Bayesian networks used in the grounding model are specially designed for modularity and computationally efficient inference. The potential of the proposed model is demonstrated comparing a conversational system for the mobile service robot RoboX employing only speech recognition for user goal identification, and a system equipped with multimodal grounding. The evaluation experiments use component and system level metrics for technical (objective) and user-based (subjective) evaluation with multimodal data collected during the conversations of the robot RoboX with user

    Error handling in multimodal voice-enabled interfaces of tour-guide robots using graphical models

    Get PDF
    Mobile service robots are going to play an increasing role in the society of humans. Voice-enabled interaction with service robots becomes very important, if such robots are to be deployed in real-world environments and accepted by the vast majority of potential human users. The research presented in this thesis addresses the problem of speech recognition integration in an interactive voice-enabled interface of a service robot, in particular a tour-guide robot. The task of a tour-guide robot is to engage visitors to mass exhibitions (users) in dialogue providing the services it is designed for (e.g. exhibit presentations) within a limited time. In managing tour-guide dialogues, extracting the user goal (intention) for requesting a particular service at each dialogue state is the key issue. In mass exhibition conditions speech recognition errors are inevitable because of noisy speech and uncooperative users of robots with no prior experience in robotics. They can jeopardize the user goal identification. Wrongly identified user goals can lead to communication failures. Therefore, to reduce the risk of such failures, methods for detecting and compensating for communication failures in human-robot dialogue are needed. During the short-term interaction with visitors, the interpretation of the user goal at each dialogue state can be improved by combining speech recognition in the speech modality with information from other available robot modalities. The methods presented in this thesis exploit probabilistic models for fusing information from speech and auxiliary modalities of the robot for user goal identification and communication failure detection. To compensate for the detected communication failures we investigate multimodal methods for recovery from communication failures. To model the process of modality fusion, taking into account the uncertainties in the information extracted from each input modality during human-robot interaction, we use the probabilistic framework of Bayesian networks. Bayesian networks are graphical models that represent a joint probability function over a set of random variables. They are used to model the dependencies among variables associated with the user goals, modality related events (e.g. the event of user presence that is inferred from the laser scanner modality of the robot), and observed modality features providing evidence in favor of these modality events. Bayesian networks are used to calculate posterior probabilities over the possible user goals at each dialogue state. These probabilities serve as a base in deciding if the user goal is valid, i.e. if it can be mapped into a tour-guide service (e.g. exhibit presentation) or is undefined – signaling a possible communication failure. The Bayesian network can be also used to elicit probabilities over the modality events revealing information about the possible cause for a communication failure. Introducing new user goal aspects (e.g. new modality events and related features) that provide auxiliary information for detecting communication failures makes the design process cumbersome, calling for a systematic approach in the Bayesian network modelling. Generally, introducing new variables for user goal identification in the Bayesian networks can lead to complex and computationally expensive models. In order to make the design process more systematic and modular, we adapt principles from the theory of grounding in human communication. When people communicate, they resolve understanding problems in a collaborative joint effort of providing evidence of common shared knowledge (grounding). We use Bayesian network topologies, tailored to limited computational resources, to model a state-based grounding model fusing information from three different input modalities (laser, video and speech) to infer possible grounding states. These grounding states are associated with modality events showing if the user is present in range for communication, if the user is attending to the interaction, whether the speech modality is reliable, and if the user goal is valid. The state-based grounding model is used to compute probabilities that intermediary grounding states have been reached. This serves as a base for detecting if the the user has reached the final grounding state, or wether a repair dialogue sequence is needed. In the case of a repair dialogue sequence, the tour-guide robot can exploit the multiple available modalities along with speech. For example, if the user has failed to reach the grounding state related to her/his presence in range for communication, the robot can use its move modality to search and attract the attention of the visitors. In the case when speech recognition is detected to be unreliable, the robot can offer the alternative use of the buttons modality in the repair sequence. Given the probability of each grounding state, and the dialogue sequence that can be executed in the next dialogue state, a tour-guide robot has different preferences on the possible dialogue continuation. If the possible dialogue sequences at each dialogue state are defined as actions, the introduced principle of maximum expected utility (MEU) provides an explicit way of action selection, based on the action utility, given the evidence about the user goal at each dialogue state. Decision networks, constructed as graphical models based on Bayesian networks are proposed to perform MEU-based decisions, incorporating the utility of the actions to be chosen at each dialogue state by the tour-guide robot. These action utilities are defined taking into account the tour-guide task requirements. The proposed graphical models for user goal identification and dialogue error handling in human-robot dialogue are evaluated in experiments with multimodal data. These data were collected during the operation of the tour-guide robot RoboX at the Autonomous System Lab of EPFL and at the Swiss National Exhibition in 2002 (Expo.02). The evaluation experiments use component and system level metrics for technical (objective) and user-based (subjective) evaluation. On the component level, the technical evaluation is done by calculating accuracies, as objective measures of the performance of the grounding model, and the resulting performance of the user goal identification in dialogue. The benefit of the proposed error handling framework is demonstrated comparing the accuracy of a baseline interactive system, employing only speech recognition for user goal identification, and a system equipped with multimodal grounding models for error handling

    Multimodal Interaction Management for Tour-Guide Robots Using Bayesian Networks

    Get PDF
    In this paper, we propose a Bayesian network framework for managing interactivity between a tour-guide robot and visitors in mass exhibition conditions, through robust interpretation of multi-modal signals. We report on methods and experiments interpreting speech and laser scanner signals in the spoken dialogue management system of the autonomous tour-guide robot RoboX, successfully deployed at the Swiss National Exhibition (Expo.02). A correct interpretation of a users (visitors) goal or intention at each dialogue state is a key issue for successful speech-based interaction in voice-enabled communication between robots and visitors. We introduce a Bayesian network approach for combining noisy speech recognition results with noise-independent data from a laser scanner, in order to infer the visitors goal under the uncertainty intrinsic to these two modalities. We demonstrate the effectiveness of the approach by simulation based on real observations during experiments with the tour-guide robot RoboX at Expo.02

    Low-level grounding in a multimodal mobile service robot conversational system using graphical models

    Get PDF
    The main task of a service robot with a voice-enabled communication interface is to engage a user in dialogue providing an access to the services it is designed for. In managing such interaction, inferring the user goal (intention) from the request for a service at each dialogue turn is the key issue. In service robot deployment conditions speech recognition limitations with noisy speech input and inexperienced users may jeopardize user goal identification. In this paper, we introduce a grounding state-based model motivated by reducing the risk of communication failure due to incorrect user goal identification. The model exploits the multiple modalities available in the service robot system to provide evidence for reaching grounding states. In order to handle the speech input as sufficiently grounded (correctly understood) by the robot, four proposed states have to be reached. Bayesian networks combining speech and non-speech modalities during user goal identification are used to estimate probability that each grounding state has been reached. These probabilities serve as a base for detecting whether the user is attending to the conversation, as well as for deciding on an alternative input modality (e.g., buttons) when the speech modality is unreliable. The Bayesian networks used in the grounding model are specially designed for modularity and computationally efficient inference. The potential of the proposed model is demonstrated comparing a conversational system for the mobile service robot RoboX employing only speech recognition for user goal identification, and a system equipped with multimodal grounding.The evaluation experiments use component and system level metrics for technical (objective) and user-based (subjective) evaluation with multimodal data collected during the conversations of the robot RoboX with users

    Four Mode Based Dialogue Management with Modified POMDP Model

    Get PDF
    This thesis proposes a method to manage the interaction between the user and the system dynamically, through speech or text input which updates the user goals, select system actions and calculate rewards for each system response at each time-stamp. The main focus is made on the dialog manager, which decides how to continue the dialogue. We have used POMDP technique, as it maintains a belief distribution on the dialogue states based on the observations over the dialogue even in a noisy environment. Four contextual control modes are introduced in dialogue management for decision-making mechanism, and to keep track of machine behaviour for each dialogue state. The result obtained proves that our proposed framework has overcome the limitations of prior POMDP methods, and exactly understands the actual intention of the users within the available time, providing very interactive conversation between the user and the computer

    Acquiring and Maintaining Knowledge by Natural Multimodal Dialog

    Get PDF

    Machine Learning for Interactive Systems: Challenges and Future Trends

    Get PDF
    National audienceMachine learning has been introduced more than 40 years ago in interactive systems through speech recognition or computer vision. Since that, machine learning gained in interest in the scientific community involved in human- machine interaction and raised in the abstraction scale. It moved from fundamental signal processing to language understanding and generation, emotion and mood recogni- tion and even dialogue management or robotics control. So far, existing machine learning techniques have often been considered as a solution to some problems raised by inter- active systems. Yet, interaction is also the source of new challenges for machine learning and offers new interesting practical but also theoretical problems to solve. In this paper, we address these challenges and describe why research in machine learning and interactive systems should converge in the future

    Towards a framework for socially interactive robots

    Get PDF
    250 p.En las últimas décadas, la investigación en el campo de la robótica social ha crecido considerablemente. El desarrollo de diferentes tipos de robots y sus roles dentro de la sociedad se están expandiendo poco a poco. Los robots dotados de habilidades sociales pretenden ser utilizados para diferentes aplicaciones; por ejemplo, como profesores interactivos y asistentes educativos, para apoyar el manejo de la diabetes en niños, para ayudar a personas mayores con necesidades especiales, como actores interactivos en el teatro o incluso como asistentes en hoteles y centros comerciales.El equipo de investigación RSAIT ha estado trabajando en varias áreas de la robótica, en particular,en arquitecturas de control, exploración y navegación de robots, aprendizaje automático y visión por computador. El trabajo presentado en este trabajo de investigación tiene como objetivo añadir una nueva capa al desarrollo anterior, la capa de interacción humano-robot que se centra en las capacidades sociales que un robot debe mostrar al interactuar con personas, como expresar y percibir emociones, mostrar un alto nivel de diálogo, aprender modelos de otros agentes, establecer y mantener relaciones sociales, usar medios naturales de comunicación (mirada, gestos, etc.),mostrar personalidad y carácter distintivos y aprender competencias sociales.En esta tesis doctoral, tratamos de aportar nuestro grano de arena a las preguntas básicas que surgen cuando pensamos en robots sociales: (1) ¿Cómo nos comunicamos (u operamos) los humanos con los robots sociales?; y (2) ¿Cómo actúan los robots sociales con nosotros? En esa línea, el trabajo se ha desarrollado en dos fases: en la primera, nos hemos centrado en explorar desde un punto de vista práctico varias formas que los humanos utilizan para comunicarse con los robots de una maneranatural. En la segunda además, hemos investigado cómo los robots sociales deben actuar con el usuario.Con respecto a la primera fase, hemos desarrollado tres interfaces de usuario naturales que pretenden hacer que la interacción con los robots sociales sea más natural. Para probar tales interfaces se han desarrollado dos aplicaciones de diferente uso: robots guía y un sistema de controlde robot humanoides con fines de entretenimiento. Trabajar en esas aplicaciones nos ha permitido dotar a nuestros robots con algunas habilidades básicas, como la navegación, la comunicación entre robots y el reconocimiento de voz y las capacidades de comprensión.Por otro lado, en la segunda fase nos hemos centrado en la identificación y el desarrollo de los módulos básicos de comportamiento que este tipo de robots necesitan para ser socialmente creíbles y confiables mientras actúan como agentes sociales. Se ha desarrollado una arquitectura(framework) para robots socialmente interactivos que permite a los robots expresar diferentes tipos de emociones y mostrar un lenguaje corporal natural similar al humano según la tarea a realizar y lascondiciones ambientales.La validación de los diferentes estados de desarrollo de nuestros robots sociales se ha realizado mediante representaciones públicas. La exposición de nuestros robots al público en esas actuaciones se ha convertido en una herramienta esencial para medir cualitativamente la aceptación social de los prototipos que estamos desarrollando. De la misma manera que los robots necesitan un cuerpo físico para interactuar con el entorno y convertirse en inteligentes, los robots sociales necesitan participar socialmente en tareas reales para las que han sido desarrollados, para así poder mejorar su sociabilida

    Designing Embodied Interactive Software Agents for E-Learning: Principles, Components, and Roles

    Get PDF
    Embodied interactive software agents are complex autonomous, adaptive, and social software systems with a digital embodiment that enables them to act on and react to other entities (users, objects, and other agents) in their environment through bodily actions, which include the use of verbal and non-verbal communicative behaviors in face-to-face interactions with the user. These agents have been developed for various roles in different application domains, in which they perform tasks that have been assigned to them by their developers or delegated to them by their users or by other agents. In computer-assisted learning, embodied interactive pedagogical software agents have the general task to promote human learning by working with students (and other agents) in computer-based learning environments, among them e-learning platforms based on Internet technologies, such as the Virtual Linguistics Campus (www.linguistics-online.com). In these environments, pedagogical agents provide contextualized, qualified, personalized, and timely assistance, cooperation, instruction, motivation, and services for both individual learners and groups of learners. This thesis develops a comprehensive, multidisciplinary, and user-oriented view of the design of embodied interactive pedagogical software agents, which integrates theoretical and practical insights from various academic and other fields. The research intends to contribute to the scientific understanding of issues, methods, theories, and technologies that are involved in the design, implementation, and evaluation of embodied interactive software agents for different roles in e-learning and other areas. For developers, the thesis provides sixteen basic principles (Added Value, Perceptible Qualities, Balanced Design, Coherence, Consistency, Completeness, Comprehensibility, Individuality, Variability, Communicative Ability, Modularity, Teamwork, Participatory Design, Role Awareness, Cultural Awareness, and Relationship Building) plus a large number of specific guidelines for the design of embodied interactive software agents and their components. Furthermore, it offers critical reviews of theories, concepts, approaches, and technologies from different areas and disciplines that are relevant to agent design. Finally, it discusses three pedagogical agent roles (virtual native speaker, coach, and peer) in the scenario of the linguistic fieldwork classes on the Virtual Linguistics Campus and presents detailed considerations for the design of an agent for one of these roles (the virtual native speaker)
    • …
    corecore