268 research outputs found

    Model-based Bayesian Reinforcement Learning for Dialogue Management

    Get PDF
    Reinforcement learning methods are increasingly used to optimise dialogue policies from experience. Most current techniques are model-free: they directly estimate the utility of various actions, without explicit model of the interaction dynamics. In this paper, we investigate an alternative strategy grounded in model-based Bayesian reinforcement learning. Bayesian inference is used to maintain a posterior distribution over the model parameters, reflecting the model uncertainty. This parameter distribution is gradually refined as more data is collected and simultaneously used to plan the agent's actions. Within this learning framework, we carried out experiments with two alternative formalisations of the transition model, one encoded with standard multinomial distributions, and one structured with probabilistic rules. We demonstrate the potential of our approach with empirical results on a user simulator constructed from Wizard-of-Oz data in a human-robot interaction scenario. The results illustrate in particular the benefits of capturing prior domain knowledge with high-level rules

    Nonstrict hierarchical reinforcement learning for interactive systems and robots

    Get PDF
    Conversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the approximate true value function of a policy or by using a hierarchical decomposition of a learning task into subtasks. We present a novel approach for dialogue policy optimization that combines the benefits of both hierarchical control and function approximation and that allows flexible transitions between dialogue subtasks to give human users more control over the dialogue. To this end, each reinforcement learning agent in the hierarchy is extended with a subtask transition function and a dynamic state space to allow flexible switching between subdialogues. In addition, the subtask policies are represented with linear function approximation in order to generalize the decision making to situations unseen in training. Our proposed approach is evaluated in an interactive conversational robot that learns to play quiz games. Experimental results, using simulation and real users, provide evidence that our proposed approach can lead to more flexible (natural) interactions than strict hierarchical control and that it is preferred by human users

    An Asynchronous Simulation Framework for Multi-User Interactive Collaboration: Application to Robot-Assisted Surgery

    Get PDF
    The field of surgery is continually evolving as there is always room for improvement in the post-operative health of the patient as well as the comfort of the Operating Room (OR) team. While the success of surgery is contingent upon the skills of the surgeon and the OR team, the use of specialized robots has shown to improve surgery-related outcomes in some cases. These outcomes are currently measured using a wide variety of metrics that include patient pain and recovery, surgeon’s comfort, duration of the operation and the cost of the procedure. There is a need for additional research to better understand the optimal criteria for benchmarking surgical performance. Presently, surgeons are trained to perform robot-assisted surgeries using interactive simulators. However, in the absence of well-defined performance standards, these simulators focus primarily on the simulation of the operative scene and not the complexities associated with multiple inputs to a real-world surgical procedure. Because interactive simulators are typically designed for specific robots that perform a small number of tasks controlled by a single user, they are inflexible in terms of their portability to different robots and the inclusion of multiple operators (e.g., nurses, medical assistants). Additionally, while most simulators provide high-quality visuals, simplification techniques are often employed to avoid stability issues for physics computation, contact dynamics and multi-manual interaction. This study addresses the limitations of existing simulators by outlining various specifications required to develop techniques that mimic real-world interactions and collaboration. Moreover, this study focuses on the inclusion of distributed control, shared task allocation and assistive feedback -- through machine learning, secondary and tertiary operators -- alongside the primary human operator

    How Can Physiological Computing Benefit Human-Robot Interaction?

    Get PDF
    As systems grow more automatized, the human operator is all too often overlooked. Although human-robot interaction (HRI) can be quite demanding in terms of cognitive resources, the mental states (MS) of the operators are not yet taken into account by existing systems. As humans are no providential agents, this lack can lead to hazardous situations. The growing number of neurophysiology and machine learning tools now allows for efficient operators' MS monitoring. Sending feedback on MS in a closed-loop solution is therefore at hands. Involving a consistent automated planning technique to handle such a process could be a significant asset. This perspective article was meant to provide the reader with a synthesis of the significant literature with a view to implementing systems that adapt to the operator's MS to improve human-robot operations' safety and performance. First of all, the need for this approach is detailed as regards remote operation, an example of HRI. Then, several MS identified as crucial for this type of HRI are defined, along with relevant electrophysiological markers. A focus is made on prime degraded MS linked to time-on-task and task demands, as well as collateral MS linked to system outputs (i.e. feedback and alarms). Lastly, the principle of symbiotic HRI is detailed and one solution is proposed to include the operator state vector into the system using a mixed-initiative decisional framework to drive such an interaction

    Context-Aware Shared Control of a Robot Mobility Aid for the Elderly Blind

    Get PDF
    This paper describes the use of a Bayesian network to provide context-aware shared control of a robot mobility aid for the frail blind. The robot mobility aid, PAM-AID, is a “smart walker” that aims to assist the frail and elderly blind to walk safely indoors. The Bayesian network combines user input with high-level information derived from the sensors to provide a context-aware estimate of the user’s current navigation goals. This context-aware action selection mechanism facilitates the use of a very simple, low bandwidth user interface, which is critical for the elderly user group. The PAM-AID systems have been evaluated through a series of field trails involving over 30 potential users

    Spatial and Temporal Learning in Robotic Pick-and-Place Domains via Demonstrations and Observations

    Get PDF
    Traditional methods for Learning from Demonstration require users to train the robot through the entire process, or to provide feedback throughout a given task. These previous methods have proved to be successful in a selection of robotic domains; however, many are limited by the ability of the user to effectively demonstrate the task. In many cases, noisy demonstrations or a failure to understand the underlying model prevent these methods from working with a wider range of non-expert users. My insight is that in many mobile pick-and-place domains, teaching is done at a too fine grained level. In many such tasks, users are solely concerned with the end goal. This implies that the complexity and time associated with training and teaching robots through the entirety of the task is unnecessary. The robotic agent needs to know (1) a probable search location to retrieve the task\u27s objects and (2) how to arrange the items to complete the task. This thesis work develops new techniques for obtaining such data from high-level spatial and temporal observations and demonstrations which can later be applied in new, unseen environments. This thesis makes the following contributions: (1) This work is built on a crowd robotics platform and, as such, we contribute the development of efficient data streaming techniques to further these capabilities. By doing so, users can more easily interact with robots on a number of platforms. (2) The presentation of new algorithms that can learn pick-and-place tasks from a large corpus of goal templates. My work contributes algorithms that produce a metric which ranks the appropriate frame of reference for each item based solely on spatial demonstrations. (3) An algorithm which can enhance the above templates with ordering constraints using coarse and noisy temporal information. Such a method eliminates the need for a user to explicitly specify such constraints and searches for an optimal ordering and placement of items. (4) A novel algorithm which is able to learn probable search locations of objects based solely on sparsely made temporal observations. For this, we introduce persistence models of objects customized to a user\u27s environment
    • …
    corecore