3,098 research outputs found

    Hindsight policy gradients

    Get PDF
    A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enable sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this paper, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.Comment: Accepted to ICLR 201

    Reinforcement Learning in Sparse-Reward Environments with Hindsight Policy Gradients

    Get PDF
    A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enabling sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this letter, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency

    Dot-to-Dot: Explainable Hierarchical Reinforcement Learning for Robotic Manipulation

    Full text link
    Robotic systems are ever more capable of automation and fulfilment of complex tasks, particularly with reliance on recent advances in intelligent systems, deep learning and artificial intelligence. However, as robots and humans come closer in their interactions, the matter of interpretability, or explainability of robot decision-making processes for the human grows in importance. A successful interaction and collaboration will only take place through mutual understanding of underlying representations of the environment and the task at hand. This is currently a challenge in deep learning systems. We present a hierarchical deep reinforcement learning system, consisting of a low-level agent handling the large actions/states space of a robotic system efficiently, by following the directives of a high-level agent which is learning the high-level dynamics of the environment and task. This high-level agent forms a representation of the world and task at hand that is interpretable for a human operator. The method, which we call Dot-to-Dot, is tested on a MuJoCo-based model of the Fetch Robotics Manipulator, as well as a Shadow Hand, to test its performance. Results show efficient learning of complex actions/states spaces by the low-level agent, and an interpretable representation of the task and decision-making process learned by the high-level agent

    Overcoming Exploration in Reinforcement Learning with Demonstrations

    Full text link
    Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.Comment: 8 pages, ICRA 201

    Autonomous guidewire navigation in a two dimensional vascular phantom

    Get PDF
    The treatment of cerebro- and cardiovascular diseases requires complex and challenging navigation of a catheter. Previous attempts to automate catheter navigation lack the ability to be generalizable. Methods of Deep Reinforcement Learning show promising results and may be the key to automate catheter navigation through the tortuous vascular tree. This work investigates Deep Reinforcement Learning for guidewire manipulation in a complex and rigid vascular model in 2D. The neural network trained by Deep Deterministic Policy Gradients with Hindsight Experience Replay performs well on the low-level control task, however the high-level control of the path planning must be improved further

    Learning cloth manipulation with demonstrations

    Get PDF
    Recent advances in Deep Reinforcement learning and computational capabilities of GPUs have led to variety of research being conducted in the learning side of robotics. The main aim being that of making autonomous robots that are capable of learning how to solve a task on their own with minimal requirement for engineering on the planning, vision, or control side. Efforts have been made to learn the manipulation of rigid objects through the help of human demonstrations, specifically in the tasks such as stacking of multiple blocks on top of each other, inserting a pin into a hole, etc. These Deep RL algorithms successfully learn how to complete a task involving the manipulation of rigid objects, but autonomous manipulation of textile objects such as clothes through Deep RL algorithms is still not being studied in the community. The main objectives of this work involve, 1) implementing the state of the art Deep RL algorithms for rigid object manipulation and getting a deep understanding of the working of these various algorithms, 2) Creating an open-source simulation environment for simulating textile objects such as clothes, 3) Designing Deep RL algorithms for learning autonomous manipulation of textile objects through demonstrations.Peer ReviewedPreprin
    corecore