Search CORE

27,057 research outputs found

Overcoming Exploration in Reinforcement Learning with Demonstrations

Author: Abbeel Pieter
Andrychowicz Marcin
McGrew Bob
Nair Ashvin
Zaremba Wojciech
Publication venue
Publication date: 25/02/2018
Field of study

Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL). Many tasks are natural to specify with a sparse reward, and manually shaping a reward function can result in suboptimal performance. However, finding a non-zero reward is exponentially more difficult with increasing task horizon or action dimensionality. This puts many real-world tasks out of practical reach of RL methods. In this work, we use demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm. Our method, which builds on top of Deep Deterministic Policy Gradients and Hindsight Experience Replay, provides an order of magnitude of speedup over RL on simulated robotics tasks. It is simple to implement and makes only the additional assumption that we can collect a small set of demonstrations. Furthermore, our method is able to solve tasks not solvable by either RL or behavior cloning alone, and often ends up outperforming the demonstrator policy.Comment: 8 pages, ICRA 201

arXiv.org e-Print Archive

Model Learning for Look-ahead Exploration in Continuous Control

Author: Agarwal Arpit
Fragkiadaki Katerina
Muelling Katharina
Publication venue
Publication date: 20/11/2018
Field of study

We propose an exploration method that incorporates look-ahead search over basic learnt skills and their dynamics, and use it for reinforcement learning (RL) of manipulation policies . Our skills are multi-goal policies learned in isolation in simpler environments using existing multigoal RL formulations, analogous to options or macroactions. Coarse skill dynamics, i.e., the state transition caused by a (complete) skill execution, are learnt and are unrolled forward during lookahead search. Policy search benefits from temporal abstraction during exploration, though itself operates over low-level primitive actions, and thus the resulting policies does not suffer from suboptimality and inflexibility caused by coarse skill chaining. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parametrized skills as building blocks of the policy itself, as opposed to guiding exploration. We show that the proposed exploration strategy results in effective learning of complex manipulation policies faster than current state-of-the-art RL methods, and converges to better policies than methods that use options or parameterized skills as building blocks of the policy itself, as opposed to guiding exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Asymmetric Actor Critic for Image-Based Robot Learning

Author: Abbeel Pieter
Andrychowicz Marcin
Pinto Lerrel
Welinder Peter
Zaremba Wojciech
Publication venue
Publication date: 17/10/2017
Field of study

Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, Robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working with a simulator. In this work, we exploit the full state observability in the simulator to train better policies which take as input only partial observations (RGBD images). We do this by employing an actor-critic training algorithm in which the critic is trained on full states while the actor (or policy) gets rendered images as input. We show experimentally on a range of simulated tasks that using these asymmetric inputs significantly improves performance. Finally, we combine this method with domain randomization and show real robot experiments for several tasks like picking, pushing, and moving a block. We achieve this simulation to real world transfer without training on any real world data.Comment: Videos of experiments can be found at http://www.goo.gl/b57WT

arXiv.org e-Print Archive

Dot-to-Dot: Explainable Hierarchical Reinforcement Learning for Robotic Manipulation

Author: Beyret Benjamin
Faisal A. Aldo
Shafti Ali
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/08/2019
Field of study

Robotic systems are ever more capable of automation and fulfilment of complex tasks, particularly with reliance on recent advances in intelligent systems, deep learning and artificial intelligence. However, as robots and humans come closer in their interactions, the matter of interpretability, or explainability of robot decision-making processes for the human grows in importance. A successful interaction and collaboration will only take place through mutual understanding of underlying representations of the environment and the task at hand. This is currently a challenge in deep learning systems. We present a hierarchical deep reinforcement learning system, consisting of a low-level agent handling the large actions/states space of a robotic system efficiently, by following the directives of a high-level agent which is learning the high-level dynamics of the environment and task. This high-level agent forms a representation of the world and task at hand that is interpretable for a human operator. The method, which we call Dot-to-Dot, is tested on a MuJoCo-based model of the Fetch Robotics Manipulator, as well as a Shadow Hand, to test its performance. Results show efficient learning of complex actions/states spaces by the low-level agent, and an interpretable representation of the task and decision-making process learned by the high-level agent

arXiv.org e-Print Archive

The Behavioral Paradox: Why Investor Irrationality Calls for Lighter and Simpler Financial Regulation

Author: Juurikkala Oskari
Publication venue: FLASH: The Fordham Law Archive of Scholarship and History
Publication date: 01/01/2012
Field of study

It is widely believed that behavioral economics justifies more intrusive regulation of financial markets, because people are not fully rational and need to be protected from their quirks. This Article challenges that belief. Firstly, insofar as people can be helped to make better choices, that goal can usually be achieved through light-touch regulations. Secondly, faulty perceptions about markets seem to be best corrected through market-based solutions. Thirdly, increasing regulation does not seem to solve problems caused by lack of market discipline, pricing inefficiencies, and financial innovation; better results may be achieved with freer markets and simpler rules. Fourthly, regulatory rule makers are subject to imperfect rationality, which tends to reduce the quality of regulatory intervention. Finally, regulatory complexity exacerbates the harmful effects of bounded rationality, whereas simple and stable rules give rise to positive learning effects

Revistes Catalanes amb Accés Obert

Fordham University School of Law