418 research outputs found
Enabling Robots to Communicate their Objectives
The overarching goal of this work is to efficiently enable end-users to
correctly anticipate a robot's behavior in novel situations. Since a robot's
behavior is often a direct result of its underlying objective function, our
insight is that end-users need to have an accurate mental model of this
objective function in order to understand and predict what the robot will do.
While people naturally develop such a mental model over time through observing
the robot act, this familiarization process may be lengthy. Our approach
reduces this time by having the robot model how people infer objectives from
observed behavior, and then it selects those behaviors that are maximally
informative. The problem of computing a posterior over objectives from observed
behavior is known as Inverse Reinforcement Learning (IRL), and has been applied
to robots learning human objectives. We consider the problem where the roles of
human and robot are swapped. Our main contribution is to recognize that unlike
robots, humans will not be exact in their IRL inference. We thus introduce two
factors to define candidate approximate-inference models for human learning in
this setting, and analyze them in a user study in the autonomous driving
domain. We show that certain approximate-inference models lead to the robot
generating example behaviors that better enable users to anticipate what it
will do in novel situations. Our results also suggest, however, that additional
research is needed in modeling how humans extrapolate from examples of robot
behavior.Comment: RSS 201
Coherent Soft Imitation Learning
Imitation learning methods seek to learn from an expert either through
behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL)
of the reward. Such methods enable agents to learn complex tasks from humans
that are difficult to capture with hand-designed reward functions. Choosing BC
or IRL for imitation depends on the quality and state-action coverage of the
demonstrations, as well as additional access to the Markov decision process.
Hybrid strategies that combine BC and IRL are not common, as initial policy
optimization against inaccurate rewards diminishes the benefit of pretraining
the policy with BC. This work derives an imitation method that captures the
strengths of both BC and IRL. In the entropy-regularized ('soft') reinforcement
learning setting, we show that the behaviour-cloned policy can be used as both
a shaped reward and a critic hypothesis space by inverting the regularized
policy update. This coherency facilities fine-tuning cloned policies using the
reward estimate and additional interactions with the environment. This approach
conveniently achieves imitation learning through initial behaviour cloning,
followed by refinement via RL with online or offline data sources. The
simplicity of the approach enables graceful scaling to high-dimensional and
vision-based tasks, with stable learning and minimal hyperparameter tuning, in
contrast to adversarial approaches.Comment: 51 pages, 47 figures. DeepMind internship repor
Improved Performance of d<sub>31</sub>-Mode Needle-actuating Transducer with PMN-PT Piezocrystal
Prototypes of a PZT-based ultrasound needle-actuating device have shown the ability to reduce needle penetration force and enhance needle visibility with color Doppler imaging during needle insertion for tissue biopsy and regional anesthesia. However, the demand for smaller, lighter devices and the need for high performance transducers have motivated investigation of a different configuration of needle-actuation transducer, utilizing the d 31-mode of PZT4 piezoceramic, and exploration of further improvement in its performance using relaxor-type piezocrystal. This paper outlines the development of the d 31-mode needle actuation transducer design from simulation to fabrication and demonstration. Full characterization was performed on transducers for performance comparison. The performance of the proposed smaller, lighter d 31-mode transducer is comparable with that of previous -mode transducers. Furthermore, it has been found to be much more efficient when using PMN-PT piezocrystal rather than piezoceramic. </p
Recommended from our members
Optimizing for Robot Transparency
As robots become more capable and commonplace, it becomes increasingly important that they are transparent to humans. People need to have accurate mental models of a robot, so that they can anticipate what it will do, know when and where not to rely it, and understand why it failed. This helps engineers ensure safety and robustness of the robot systems they develop, and enables human end-users to interact more safely and seamlessly with robots.This thesis introduces a framework for producing robot behavior that increases transparency. Our key insight is that a robot's actions do not just influence the physical world; they also inevitably influence a human observer's mental model of the robot. We attempt to model the latter---how humans might make inferences about a robot's objectives, policy, and capabilities from observations of its behavior---so that we can then present examples of robot behavior that optimally bring the human's understanding closer to the true robot model. In this way, our framework casts transparency as an optimization problem.Part I introduces our framework of optimizing for robot transparency, and applies it in three ways: communicating a robot's objectives, which situations it can handle, and why it is incapable of performing a task. Part II investigates how transparency is useful not just for safe and seamless interaction, but also for learning. When humans teach a robot, giving human teachers transparency regarding what the robot has learned so far makes it easier for them to select informative teaching examples
Near-field propagation of tsunamis from megathrust earthquakes
We investigate controls on tsunami generation and propagation in the near-field of great megathrust earthquakes using a series of numerical simulations of subduction and tsunamigenesis on the Sumatran forearc. The Sunda megathrust here is advanced in its seismic cycle and may be ready for another great earthquake. We calculate the seafloor displacements and tsunami wave heights for about 100 complex earthquake ruptures whose synthesis was informed by reference to geodetic and stress accumulation studies. Remarkably, results show that, for any near-field location: (1) the timing of tsunami inundation is independent of slip-distribution on the earthquake or even of its magnitude, and (2) the maximum wave height is directly proportional to the vertical coseismic displacement experienced at that location. Both observations are explained by the dominance of long wavelength crustal flexure in near-field tsunamigenesis. The results show, for the first time, that a single estimate of vertical coseismic displacement might provide a reliable short-term forecast of the maximum height of tsunami waves
Recommended from our members
Development of City Destination Attractiveness Index: A China Case
This study aims to develop a comprehensive assessment model of city destination attractiveness index (CDAI), and to validate it by assessing the city destination attractiveness of the selected city destinations in China. The study result will complement the theoretical knowledge body of destination attractiveness evaluation. Besides, by measuring and matching the differences between a destination’s reality and a visitor’s perception, it can work as a decision-making instrument for DMOs, as well as improving tourists’ satisfaction
- …