10,414 research outputs found

    Overcoming Exploration in Reinforcement Learning with Demonstrations

    Full text link
    Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.Comment: 8 pages, ICRA 201

    Learning cloth manipulation with demonstrations

    Get PDF
    Recent advances in Deep Reinforcement learning and computational capabilities of GPUs have led to variety of research being conducted in the learning side of robotics. The main aim being that of making autonomous robots that are capable of learning how to solve a task on their own with minimal requirement for engineering on the planning, vision, or control side. Efforts have been made to learn the manipulation of rigid objects through the help of human demonstrations, specifically in the tasks such as stacking of multiple blocks on top of each other, inserting a pin into a hole, etc. These Deep RL algorithms successfully learn how to complete a task involving the manipulation of rigid objects, but autonomous manipulation of textile objects such as clothes through Deep RL algorithms is still not being studied in the community. The main objectives of this work involve, 1) implementing the state of the art Deep RL algorithms for rigid object manipulation and getting a deep understanding of the working of these various algorithms, 2) Creating an open-source simulation environment for simulating textile objects such as clothes, 3) Designing Deep RL algorithms for learning autonomous manipulation of textile objects through demonstrations.Peer ReviewedPreprin

    Generative Exploration and Exploitation

    Full text link
    Sparse reward is one of the biggest challenges in reinforcement learning (RL). In this paper, we propose a novel method called Generative Exploration and Exploitation (GENE) to overcome sparse reward. GENE automatically generates start states to encourage the agent to explore the environment and to exploit received reward signals. GENE can adaptively tradeoff between exploration and exploitation according to the varying distributions of states experienced by the agent as the learning progresses. GENE relies on no prior knowledge about the environment and can be combined with any RL algorithm, no matter on-policy or off-policy, single-agent or multi-agent. Empirically, we demonstrate that GENE significantly outperforms existing methods in three tasks with only binary rewards, including Maze, Maze Ant, and Cooperative Navigation. Ablation studies verify the emergence of progressive exploration and automatic reversing.Comment: AAAI'2
    corecore