Search CORE

1,107 research outputs found

Learning Long Chain of Actions through Hierarchical Reinforcement Learning

Author: Anca Mihai
Publication venue
Publication date: 01/01/2024
Field of study

Lifeworld Analysis

Author: Agre P.
Horswill I.
Publication venue
Publication date: 01/01/1997
Field of study

We argue that the analysis of agent/environment interactions should be extended to include the conventions and invariants maintained by agents throughout their activity. We refer to this thicker notion of environment as a lifeworld and present a partial set of formal tools for describing structures of lifeworlds and the ways in which they computationally simplify activity. As one specific example, we apply the tools to the analysis of the Toast system and show how versions of the system with very different control structures in fact implement a common control structure together with different conventions for encoding task state in the positions or states of objects in the environment.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Overcoming Exploration in Reinforcement Learning with Demonstrations

Author: Abbeel Pieter
Andrychowicz Marcin
McGrew Bob
Nair Ashvin
Zaremba Wojciech
Publication venue
Publication date: 25/02/2018
Field of study

Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.Comment: 8 pages, ICRA 201

arXiv.org e-Print Archive

Crossref