451 research outputs found
Learning Dexterous In-Hand Manipulation
We use reinforcement learning (RL) to learn dexterous in-hand manipulation
policies which can perform vision-based object reorientation on a physical
Shadow Dexterous Hand. The training is performed in a simulated environment in
which we randomize many of the physical properties of the system like friction
coefficients and an object's appearance. Our policies transfer to the physical
robot despite being trained entirely in simulation. Our method does not rely on
any human demonstrations, but many behaviors found in human manipulation emerge
naturally, including finger gaiting, multi-finger coordination, and the
controlled use of gravity. Our results were obtained using the same distributed
RL system that was used to train OpenAI Five. We also include a video of our
results: https://youtu.be/jwSbzNHGflMComment: Making OpenAI the first author. We wish this paper to be cited as
"Learning Dexterous In-Hand Manipulation" by OpenAI et al. We are replicating
the approach from the physics community: arXiv:1812.0648
gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo
This paper presents an upgraded, real world application oriented version of
gym-gazebo, the Robot Operating System (ROS) and Gazebo based Reinforcement
Learning (RL) toolkit, which complies with OpenAI Gym. The content discusses
the new ROS 2 based software architecture and summarizes the results obtained
using Proximal Policy Optimization (PPO). Ultimately, the output of this work
presents a benchmarking system for robotics that allows different techniques
and algorithms to be compared using the same virtual conditions. We have
evaluated environments with different levels of complexity of the Modular
Articulated Robotic Arm (MARA), reaching accuracies in the millimeter scale.
The converged results show the feasibility and usefulness of the gym-gazebo 2
toolkit, its potential and applicability in industrial use cases, using modular
robots
GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning
Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively
large number of training samples for learning complex tasks. Many recent works
on speeding up Deep RL have focused on distributed training and simulation.
While distributed training is often done on the GPU, simulation is not. In this
work, we propose using GPU-accelerated RL simulations as an alternative to CPU
ones. Using NVIDIA Flex, a GPU-based physics engine, we show promising
speed-ups of learning various continuous-control, locomotion tasks. With one
GPU and CPU core, we are able to train the Humanoid running task in less than
20 minutes, using 10-1000x fewer CPU cores than previous works. We also
demonstrate the scalability of our simulator to multi-GPU settings to train
more challenging locomotion tasks.Comment: Accepted and to appear at the Conference on Robot Learning (CoRL)
201
Practical Robot Learning from Demonstrations using Deep End-to-End Training
Robots need to learn behaviors in intuitive and practical ways for widespread
deployment in human environments. To learn a robot behavior end-to-end, we
train a variant of the ResNet that maps eye-in-hand camera images to
end-effector velocities. In our setup, a human teacher demonstrates the task
via joystick. We show that a simple servoing task can be learned in less than
an hour including data collection, model training and deployment time.
Moreover, 16 minutes of demonstrations were enough for the robot to learn the
task.Comment: Presented in RSS 2019 Workshop: "Emerging paradigms for robotic
manipulation: from the lab to the productive world
Reinforcement Learning without Ground-Truth State
To perform robot manipulation tasks, a low-dimensional state of the
environment typically needs to be estimated. However, designing a state
estimator can sometimes be difficult, especially in environments with
deformable objects. An alternative is to learn an end-to-end policy that maps
directly from high-dimensional sensor inputs to actions. However, if this
policy is trained with reinforcement learning, then without a state estimator,
it is hard to specify a reward function based on high-dimensional observations.
To meet this challenge, we propose a simple indicator reward function for
goal-conditioned reinforcement learning: we only give a positive reward when
the robot's observation exactly matches a target goal observation. We show that
by relabeling the original goal with the achieved goal to obtain positive
rewards (Andrychowicz et al., 2017), we can learn with the indicator reward
function even in continuous state spaces. We propose two methods to further
speed up convergence with indicator rewards: reward balancing and reward
filtering. We show comparable performance between our method and an oracle
which uses the ground-truth state for computing rewards. We show that our
method can perform complex tasks in continuous state spaces such as rope
manipulation from RGB-D images, without knowledge of the ground-truth state
How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?
Recently, reinforcement learning (RL) algorithms have demonstrated remarkable
success in learning complicated behaviors from minimally processed input.
However, most of this success is limited to simulation. While there are
promising successes in applying RL algorithms directly on real systems, their
performance on more complex systems remains bottle-necked by the relative data
inefficiency of RL algorithms. Domain randomization is a promising direction of
research that has demonstrated impressive results using RL algorithms to
control real robots. At a high level, domain randomization works by training a
policy on a distribution of environmental conditions in simulation. If the
environments are diverse enough, then the policy trained on this distribution
will plausibly generalize to the real world. A human-specified design choice in
domain randomization is the form and parameters of the distribution of
simulated environments. It is unclear how to the best pick the form and
parameters of this distribution and prior work uses hand-tuned distributions.
This extended abstract demonstrates that the choice of the distribution plays a
major role in the performance of the trained policies in the real world and
that the parameter of this distribution can be optimized to maximize the
performance of the trained policies in the real worldComment: 2-page extended abstrac
Multi-Task Reinforcement Learning based Mobile Manipulation Control for Dynamic Object Tracking and Grasping
Agile control of mobile manipulator is challenging because of the high
complexity coupled by the robotic system and the unstructured working
environment. Tracking and grasping a dynamic object with a random trajectory is
even harder. In this paper, a multi-task reinforcement learning-based mobile
manipulation control framework is proposed to achieve general dynamic object
tracking and grasping. Several basic types of dynamic trajectories are chosen
as the task training set. To improve the policy generalization in practice,
random noise and dynamics randomization are introduced during the training
process. Extensive experiments show that our policy trained can adapt to unseen
random dynamic trajectories with about 0.1m tracking error and 75\% grasping
success rate of dynamic objects. The trained policy can also be successfully
deployed on a real mobile manipulator.Comment: 6 pages, 7 figures, submitted to IROS202
Curriculum goal masking for continuous deep reinforcement learning
Deep reinforcement learning has recently gained a focus on problems where
policy or value functions are independent of goals. Evidence exists that the
sampling of goals has a strong effect on the learning performance, but there is
a lack of general mechanisms that focus on optimizing the goal sampling
process. In this work, we present a simple and general goal masking method that
also allows us to estimate a goal's difficulty level and thus realize a
curriculum learning approach for deep RL. Our results indicate that focusing on
goals with a medium difficulty level is appropriate for deep deterministic
policy gradient (DDPG) methods, while an "aim for the stars and reach the
moon-strategy", where hard goals are sampled much more often than simple goals,
leads to the best learning performance in cases where DDPG is combined with for
hindsight experience replay (HER). We demonstrate that the approach
significantly outperforms standard goal sampling for different robotic object
manipulation problems
From Video Game to Real Robot: The Transfer between Action Spaces
Deep reinforcement learning has proven to be successful for learning tasks in
simulated environments, but applying same techniques for robots in real-world
domain is more challenging, as they require hours of training. To address this,
transfer learning can be used to train the policy first in a simulated
environment and then transfer it to physical agent. As the simulation never
matches reality perfectly, the physics, visuals and action spaces by necessity
differ between these environments to some degree. In this work, we study how
general video games can be directly used instead of fine-tuned simulations for
the sim-to-real transfer. Especially, we study how the agent can learn the new
action space autonomously, when the game actions do not match the robot
actions. Our results show that the different action space can be learned by
re-training only part of neural network and we obtain above 90% mean success
rate in simulation and robot experiments.Comment: Two first authors contributed equally. Accepted by ICASSP 202
Stillleben: Realistic Scene Synthesis for Deep Learning in Robotics
Training data is the key ingredient for deep learning approaches, but
difficult to obtain for the specialized domains often encountered in robotics.
We describe a synthesis pipeline capable of producing training data for
cluttered scene perception tasks such as semantic segmentation, object
detection, and correspondence or pose estimation. Our approach arranges object
meshes in physically realistic, dense scenes using physics simulation. The
arranged scenes are rendered using high-quality rasterization with randomized
appearance and material parameters. Noise and other transformations introduced
by the camera sensors are simulated. Our pipeline can be run online during
training of a deep neural network, yielding applications in life-long learning
and in iterative render-and-compare approaches. We demonstrate the usability by
learning semantic segmentation on the challenging YCB-Video dataset without
actually using any training frames, where our method achieves performance
comparable to a conventionally trained model. Additionally, we show successful
application in a real-world regrasping system.Comment: Accepted for ICRA 202
- …