451 research outputs found

    Learning Dexterous In-Hand Manipulation

    Full text link
    We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object's appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five. We also include a video of our results: https://youtu.be/jwSbzNHGflMComment: Making OpenAI the first author. We wish this paper to be cited as "Learning Dexterous In-Hand Manipulation" by OpenAI et al. We are replicating the approach from the physics community: arXiv:1812.0648

    gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo

    Full text link
    This paper presents an upgraded, real world application oriented version of gym-gazebo, the Robot Operating System (ROS) and Gazebo based Reinforcement Learning (RL) toolkit, which complies with OpenAI Gym. The content discusses the new ROS 2 based software architecture and summarizes the results obtained using Proximal Policy Optimization (PPO). Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions. We have evaluated environments with different levels of complexity of the Modular Articulated Robotic Arm (MARA), reaching accuracies in the millimeter scale. The converged results show the feasibility and usefulness of the gym-gazebo 2 toolkit, its potential and applicability in industrial use cases, using modular robots

    GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

    Full text link
    Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While distributed training is often done on the GPU, simulation is not. In this work, we propose using GPU-accelerated RL simulations as an alternative to CPU ones. Using NVIDIA Flex, a GPU-based physics engine, we show promising speed-ups of learning various continuous-control, locomotion tasks. With one GPU and CPU core, we are able to train the Humanoid running task in less than 20 minutes, using 10-1000x fewer CPU cores than previous works. We also demonstrate the scalability of our simulator to multi-GPU settings to train more challenging locomotion tasks.Comment: Accepted and to appear at the Conference on Robot Learning (CoRL) 201

    Practical Robot Learning from Demonstrations using Deep End-to-End Training

    Full text link
    Robots need to learn behaviors in intuitive and practical ways for widespread deployment in human environments. To learn a robot behavior end-to-end, we train a variant of the ResNet that maps eye-in-hand camera images to end-effector velocities. In our setup, a human teacher demonstrates the task via joystick. We show that a simple servoing task can be learned in less than an hour including data collection, model training and deployment time. Moreover, 16 minutes of demonstrations were enough for the robot to learn the task.Comment: Presented in RSS 2019 Workshop: "Emerging paradigms for robotic manipulation: from the lab to the productive world

    Reinforcement Learning without Ground-Truth State

    Full text link
    To perform robot manipulation tasks, a low-dimensional state of the environment typically needs to be estimated. However, designing a state estimator can sometimes be difficult, especially in environments with deformable objects. An alternative is to learn an end-to-end policy that maps directly from high-dimensional sensor inputs to actions. However, if this policy is trained with reinforcement learning, then without a state estimator, it is hard to specify a reward function based on high-dimensional observations. To meet this challenge, we propose a simple indicator reward function for goal-conditioned reinforcement learning: we only give a positive reward when the robot's observation exactly matches a target goal observation. We show that by relabeling the original goal with the achieved goal to obtain positive rewards (Andrychowicz et al., 2017), we can learn with the indicator reward function even in continuous state spaces. We propose two methods to further speed up convergence with indicator rewards: reward balancing and reward filtering. We show comparable performance between our method and an oracle which uses the ground-truth state for computing rewards. We show that our method can perform complex tasks in continuous state spaces such as rope manipulation from RGB-D images, without knowledge of the ground-truth state

    How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

    Full text link
    Recently, reinforcement learning (RL) algorithms have demonstrated remarkable success in learning complicated behaviors from minimally processed input. However, most of this success is limited to simulation. While there are promising successes in applying RL algorithms directly on real systems, their performance on more complex systems remains bottle-necked by the relative data inefficiency of RL algorithms. Domain randomization is a promising direction of research that has demonstrated impressive results using RL algorithms to control real robots. At a high level, domain randomization works by training a policy on a distribution of environmental conditions in simulation. If the environments are diverse enough, then the policy trained on this distribution will plausibly generalize to the real world. A human-specified design choice in domain randomization is the form and parameters of the distribution of simulated environments. It is unclear how to the best pick the form and parameters of this distribution and prior work uses hand-tuned distributions. This extended abstract demonstrates that the choice of the distribution plays a major role in the performance of the trained policies in the real world and that the parameter of this distribution can be optimized to maximize the performance of the trained policies in the real worldComment: 2-page extended abstrac

    Multi-Task Reinforcement Learning based Mobile Manipulation Control for Dynamic Object Tracking and Grasping

    Full text link
    Agile control of mobile manipulator is challenging because of the high complexity coupled by the robotic system and the unstructured working environment. Tracking and grasping a dynamic object with a random trajectory is even harder. In this paper, a multi-task reinforcement learning-based mobile manipulation control framework is proposed to achieve general dynamic object tracking and grasping. Several basic types of dynamic trajectories are chosen as the task training set. To improve the policy generalization in practice, random noise and dynamics randomization are introduced during the training process. Extensive experiments show that our policy trained can adapt to unseen random dynamic trajectories with about 0.1m tracking error and 75\% grasping success rate of dynamic objects. The trained policy can also be successfully deployed on a real mobile manipulator.Comment: 6 pages, 7 figures, submitted to IROS202

    Curriculum goal masking for continuous deep reinforcement learning

    Full text link
    Deep reinforcement learning has recently gained a focus on problems where policy or value functions are independent of goals. Evidence exists that the sampling of goals has a strong effect on the learning performance, but there is a lack of general mechanisms that focus on optimizing the goal sampling process. In this work, we present a simple and general goal masking method that also allows us to estimate a goal's difficulty level and thus realize a curriculum learning approach for deep RL. Our results indicate that focusing on goals with a medium difficulty level is appropriate for deep deterministic policy gradient (DDPG) methods, while an "aim for the stars and reach the moon-strategy", where hard goals are sampled much more often than simple goals, leads to the best learning performance in cases where DDPG is combined with for hindsight experience replay (HER). We demonstrate that the approach significantly outperforms standard goal sampling for different robotic object manipulation problems

    From Video Game to Real Robot: The Transfer between Action Spaces

    Full text link
    Deep reinforcement learning has proven to be successful for learning tasks in simulated environments, but applying same techniques for robots in real-world domain is more challenging, as they require hours of training. To address this, transfer learning can be used to train the policy first in a simulated environment and then transfer it to physical agent. As the simulation never matches reality perfectly, the physics, visuals and action spaces by necessity differ between these environments to some degree. In this work, we study how general video games can be directly used instead of fine-tuned simulations for the sim-to-real transfer. Especially, we study how the agent can learn the new action space autonomously, when the game actions do not match the robot actions. Our results show that the different action space can be learned by re-training only part of neural network and we obtain above 90% mean success rate in simulation and robot experiments.Comment: Two first authors contributed equally. Accepted by ICASSP 202

    Stillleben: Realistic Scene Synthesis for Deep Learning in Robotics

    Full text link
    Training data is the key ingredient for deep learning approaches, but difficult to obtain for the specialized domains often encountered in robotics. We describe a synthesis pipeline capable of producing training data for cluttered scene perception tasks such as semantic segmentation, object detection, and correspondence or pose estimation. Our approach arranges object meshes in physically realistic, dense scenes using physics simulation. The arranged scenes are rendered using high-quality rasterization with randomized appearance and material parameters. Noise and other transformations introduced by the camera sensors are simulated. Our pipeline can be run online during training of a deep neural network, yielding applications in life-long learning and in iterative render-and-compare approaches. We demonstrate the usability by learning semantic segmentation on the challenging YCB-Video dataset without actually using any training frames, where our method achieves performance comparable to a conventionally trained model. Additionally, we show successful application in a real-world regrasping system.Comment: Accepted for ICRA 202
    corecore