418 research outputs found

    Enabling Robots to Communicate their Objectives

    Full text link
    The overarching goal of this work is to efficiently enable end-users to correctly anticipate a robot's behavior in novel situations. Since a robot's behavior is often a direct result of its underlying objective function, our insight is that end-users need to have an accurate mental model of this objective function in order to understand and predict what the robot will do. While people naturally develop such a mental model over time through observing the robot act, this familiarization process may be lengthy. Our approach reduces this time by having the robot model how people infer objectives from observed behavior, and then it selects those behaviors that are maximally informative. The problem of computing a posterior over objectives from observed behavior is known as Inverse Reinforcement Learning (IRL), and has been applied to robots learning human objectives. We consider the problem where the roles of human and robot are swapped. Our main contribution is to recognize that unlike robots, humans will not be exact in their IRL inference. We thus introduce two factors to define candidate approximate-inference models for human learning in this setting, and analyze them in a user study in the autonomous driving domain. We show that certain approximate-inference models lead to the robot generating example behaviors that better enable users to anticipate what it will do in novel situations. Our results also suggest, however, that additional research is needed in modeling how humans extrapolate from examples of robot behavior.Comment: RSS 201

    Coherent Soft Imitation Learning

    Full text link
    Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward. Such methods enable agents to learn complex tasks from humans that are difficult to capture with hand-designed reward functions. Choosing BC or IRL for imitation depends on the quality and state-action coverage of the demonstrations, as well as additional access to the Markov decision process. Hybrid strategies that combine BC and IRL are not common, as initial policy optimization against inaccurate rewards diminishes the benefit of pretraining the policy with BC. This work derives an imitation method that captures the strengths of both BC and IRL. In the entropy-regularized ('soft') reinforcement learning setting, we show that the behaviour-cloned policy can be used as both a shaped reward and a critic hypothesis space by inverting the regularized policy update. This coherency facilities fine-tuning cloned policies using the reward estimate and additional interactions with the environment. This approach conveniently achieves imitation learning through initial behaviour cloning, followed by refinement via RL with online or offline data sources. The simplicity of the approach enables graceful scaling to high-dimensional and vision-based tasks, with stable learning and minimal hyperparameter tuning, in contrast to adversarial approaches.Comment: 51 pages, 47 figures. DeepMind internship repor

    Improved Performance of d<sub>31</sub>-Mode Needle-actuating Transducer with PMN-PT Piezocrystal

    Get PDF
    Prototypes of a PZT-based ultrasound needle-actuating device have shown the ability to reduce needle penetration force and enhance needle visibility with color Doppler imaging during needle insertion for tissue biopsy and regional anesthesia. However, the demand for smaller, lighter devices and the need for high performance transducers have motivated investigation of a different configuration of needle-actuation transducer, utilizing the d 31-mode of PZT4 piezoceramic, and exploration of further improvement in its performance using relaxor-type piezocrystal. This paper outlines the development of the d 31-mode needle actuation transducer design from simulation to fabrication and demonstration. Full characterization was performed on transducers for performance comparison. The performance of the proposed smaller, lighter d 31-mode transducer is comparable with that of previous d−33d-{33}-mode transducers. Furthermore, it has been found to be much more efficient when using PMN-PT piezocrystal rather than piezoceramic. </p

    Near-field propagation of tsunamis from megathrust earthquakes

    Get PDF
    We investigate controls on tsunami generation and propagation in the near-field of great megathrust earthquakes using a series of numerical simulations of subduction and tsunamigenesis on the Sumatran forearc. The Sunda megathrust here is advanced in its seismic cycle and may be ready for another great earthquake. We calculate the seafloor displacements and tsunami wave heights for about 100 complex earthquake ruptures whose synthesis was informed by reference to geodetic and stress accumulation studies. Remarkably, results show that, for any near-field location: (1) the timing of tsunami inundation is independent of slip-distribution on the earthquake or even of its magnitude, and (2) the maximum wave height is directly proportional to the vertical coseismic displacement experienced at that location. Both observations are explained by the dominance of long wavelength crustal flexure in near-field tsunamigenesis. The results show, for the first time, that a single estimate of vertical coseismic displacement might provide a reliable short-term forecast of the maximum height of tsunami waves
    • …
    corecore