27,057 research outputs found
Overcoming Exploration in Reinforcement Learning with Demonstrations
Exploration in environments with sparse rewards has been a persistent problem
in reinforcement learning (RL). Many tasks are natural to specify with a sparse
reward, and manually shaping a reward function can result in suboptimal
performance. However, finding a non-zero reward is exponentially more difficult
with increasing task horizon or action dimensionality. This puts many
real-world tasks out of practical reach of RL methods. In this work, we use
demonstrations to overcome the exploration problem and successfully learn to
perform long-horizon, multi-step robotics tasks with continuous control such as
stacking blocks with a robot arm. Our method, which builds on top of Deep
Deterministic Policy Gradients and Hindsight Experience Replay, provides an
order of magnitude of speedup over RL on simulated robotics tasks. It is simple
to implement and makes only the additional assumption that we can collect a
small set of demonstrations. Furthermore, our method is able to solve tasks not
solvable by either RL or behavior cloning alone, and often ends up
outperforming the demonstrator policy.Comment: 8 pages, ICRA 201
Model Learning for Look-ahead Exploration in Continuous Control
We propose an exploration method that incorporates look-ahead search over
basic learnt skills and their dynamics, and use it for reinforcement learning
(RL) of manipulation policies . Our skills are multi-goal policies learned in
isolation in simpler environments using existing multigoal RL formulations,
analogous to options or macroactions. Coarse skill dynamics, i.e., the state
transition caused by a (complete) skill execution, are learnt and are unrolled
forward during lookahead search. Policy search benefits from temporal
abstraction during exploration, though itself operates over low-level primitive
actions, and thus the resulting policies does not suffer from suboptimality and
inflexibility caused by coarse skill chaining. We show that the proposed
exploration strategy results in effective learning of complex manipulation
policies faster than current state-of-the-art RL methods, and converges to
better policies than methods that use options or parametrized skills as
building blocks of the policy itself, as opposed to guiding exploration. We
show that the proposed exploration strategy results in effective learning of
complex manipulation policies faster than current state-of-the-art RL methods,
and converges to better policies than methods that use options or parameterized
skills as building blocks of the policy itself, as opposed to guiding
exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201
Asymmetric Actor Critic for Image-Based Robot Learning
Deep reinforcement learning (RL) has proven a powerful technique in many
sequential decision making domains. However, Robotics poses many challenges for
RL, most notably training on a physical system can be expensive and dangerous,
which has sparked significant interest in learning control policies using a
physics simulator. While several recent works have shown promising results in
transferring policies trained in simulation to the real world, they often do
not fully utilize the advantage of working with a simulator. In this work, we
exploit the full state observability in the simulator to train better policies
which take as input only partial observations (RGBD images). We do this by
employing an actor-critic training algorithm in which the critic is trained on
full states while the actor (or policy) gets rendered images as input. We show
experimentally on a range of simulated tasks that using these asymmetric inputs
significantly improves performance. Finally, we combine this method with domain
randomization and show real robot experiments for several tasks like picking,
pushing, and moving a block. We achieve this simulation to real world transfer
without training on any real world data.Comment: Videos of experiments can be found at http://www.goo.gl/b57WT
Dot-to-Dot: Explainable Hierarchical Reinforcement Learning for Robotic Manipulation
Robotic systems are ever more capable of automation and fulfilment of complex
tasks, particularly with reliance on recent advances in intelligent systems,
deep learning and artificial intelligence. However, as robots and humans come
closer in their interactions, the matter of interpretability, or explainability
of robot decision-making processes for the human grows in importance. A
successful interaction and collaboration will only take place through mutual
understanding of underlying representations of the environment and the task at
hand. This is currently a challenge in deep learning systems. We present a
hierarchical deep reinforcement learning system, consisting of a low-level
agent handling the large actions/states space of a robotic system efficiently,
by following the directives of a high-level agent which is learning the
high-level dynamics of the environment and task. This high-level agent forms a
representation of the world and task at hand that is interpretable for a human
operator. The method, which we call Dot-to-Dot, is tested on a MuJoCo-based
model of the Fetch Robotics Manipulator, as well as a Shadow Hand, to test its
performance. Results show efficient learning of complex actions/states spaces
by the low-level agent, and an interpretable representation of the task and
decision-making process learned by the high-level agent
The Behavioral Paradox: Why Investor Irrationality Calls for Lighter and Simpler Financial Regulation
It is widely believed that behavioral economics justifies more intrusive regulation of financial markets, because people are not fully rational and need to be protected from their quirks. This Article challenges that belief. Firstly, insofar as people can be helped to make better choices, that goal can usually be achieved through light-touch regulations. Secondly, faulty perceptions about markets seem to be best corrected through market-based solutions. Thirdly, increasing regulation does not seem to solve problems caused by lack of market discipline, pricing inefficiencies, and financial innovation; better results may be achieved with freer markets and simpler rules. Fourthly, regulatory rule makers are subject to imperfect rationality, which tends to reduce the quality of regulatory intervention. Finally, regulatory complexity exacerbates the harmful effects of bounded rationality, whereas simple and stable rules give rise to positive learning effects
- …