72 research outputs found
Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping
The distributional perspective on reinforcement learning (RL) has given rise
to a series of successful Q-learning algorithms, resulting in state-of-the-art
performance in arcade game environments. However, it has not yet been analyzed
how these findings from a discrete setting translate to complex practical
applications characterized by noisy, high dimensional and continuous
state-action spaces. In this work, we propose Quantile QT-Opt (Q2-Opt), a
distributional variant of the recently introduced distributed Q-learning
algorithm for continuous domains, and examine its behaviour in a series of
simulated and real vision-based robotic grasping tasks. The absence of an actor
in Q2-Opt allows us to directly draw a parallel to the previous discrete
experiments in the literature without the additional complexities induced by an
actor-critic architecture. We demonstrate that Q2-Opt achieves a superior
vision-based object grasping success rate, while also being more sample
efficient. The distributional formulation also allows us to experiment with
various risk distortion metrics that give us an indication of how robots can
concretely manage risk in practice using a Deep RL control policy. As an
additional contribution, we perform batch RL experiments in our virtual
environment and compare them with the latest findings from discrete settings.
Surprisingly, we find that the previous batch RL findings from the literature
obtained on arcade game environments do not generalise to our setup.Comment: Camera-ready version for RSS 2020. Contains 8 pages, 7 figure
Non-Markov Policies to Reduce Sequential Failures in Robot Bin Picking
A new generation of automated bin picking systems using deep learning is
evolving to support increasing demand for e-commerce. To accommodate a wide
variety of products, many automated systems include multiple gripper types
and/or tool changers. However, for some objects, sequential grasp failures are
common: when a computed grasp fails to lift and remove the object, the bin is
often left unchanged; as the sensor input is consistent, the system retries the
same grasp over and over, resulting in a significant reduction in mean
successful picks per hour (MPPH). Based on an empirical study of sequential
failures, we characterize a class of "sequential failure objects" (SFOs) --
objects prone to sequential failures based on a novel taxonomy. We then propose
three non-Markov picking policies that incorporate memory of past failures to
modify subsequent actions. Simulation experiments on SFO models and the EGAD
dataset suggest that the non-Markov policies significantly outperform the
Markov policy in terms of the sequential failure rate and MPPH. In physical
experiments on 50 heaps of 12 SFOs the most effective Non-Markov policy
increased MPPH over the Dex-Net Markov policy by 107%.Comment: 2020 IEEE International Conference on Automation Science and
Engineering (CASE
Learning Risk-Aware Quadrupedal Locomotion using Distributional Reinforcement Learning
Deployment in hazardous environments requires robots to understand the risks
associated with their actions and movements to prevent accidents. Despite its
importance, these risks are not explicitly modeled by currently deployed
locomotion controllers for legged robots. In this work, we propose a risk
sensitive locomotion training method employing distributional reinforcement
learning to consider safety explicitly. Instead of relying on a value
expectation, we estimate the complete value distribution to account for
uncertainty in the robot's interaction with the environment. The value
distribution is consumed by a risk metric to extract risk sensitive value
estimates. These are integrated into Proximal Policy Optimization (PPO) to
derive our method, Distributional Proximal Policy Optimization (DPPO). The risk
preference, ranging from risk-averse to risk-seeking, can be controlled by a
single parameter, which enables to adjust the robot's behavior dynamically.
Importantly, our approach removes the need for additional reward function
tuning to achieve risk sensitivity. We show emergent risk sensitive locomotion
behavior in simulation and on the quadrupedal robot ANYmal
All the Feels: A dexterous hand with large area sensing
High cost and lack of reliability has precluded the widespread adoption of
dexterous hands in robotics. Furthermore, the lack of a viable tactile sensor
capable of sensing over the entire area of the hand impedes the rich, low-level
feedback that would improve learning of dexterous manipulation skills. This
paper introduces an inexpensive, modular, robust, and scalable platform - the
DManus- aimed at resolving these challenges while satisfying the large-scale
data collection capabilities demanded by deep robot learning paradigms. Studies
on human manipulation point to the criticality of low-level tactile feedback in
performing everyday dexterous tasks. The DManus comes with ReSkin sensing on
the entire surface of the palm as well as the fingertips. We demonstrate
effectiveness of the fully integrated system in a tactile aware task - bin
picking and sorting. Code, documentation, design files, detailed assembly
instructions, trained models, task videos, and all supplementary materials
required to recreate the setup can be found on
http://roboticsbenchmarks.org/platforms/dmanusComment: 6 pages + references and appendix, 7 figures. Submitted to ICRA 202
- …