218,401 research outputs found
Deep Policies for Width-Based Planning in Pixel Domains
Width-based planning has demonstrated great success in recent years due to
its ability to scale independently of the size of the state space. For example,
Bandres et al. (2018) introduced a rollout version of the Iterated Width
algorithm whose performance compares well with humans and learning methods in
the pixel setting of the Atari games suite. In this setting, planning is done
on-line using the "screen" states and selecting actions by looking ahead into
the future. However, this algorithm is purely exploratory and does not leverage
past reward information. Furthermore, it requires the state to be factored into
features that need to be pre-defined for the particular task, e.g., the B-PROST
pixel features. In this work, we extend width-based planning by incorporating
an explicit policy in the action selection mechanism. Our method, called
-IW, interleaves width-based planning and policy learning using the
state-actions visited by the planner. The policy estimate takes the form of a
neural network and is in turn used to guide the planning step, thus reinforcing
promising paths. Surprisingly, we observe that the representation learned by
the neural network can be used as a feature space for the width-based planner
without degrading its performance, thus removing the requirement of pre-defined
features for the planner. We compare -IW with previous width-based methods
and with AlphaZero, a method that also interleaves planning and learning, in
simple environments, and show that -IW has superior performance. We also
show that -IW algorithm outperforms previous width-based methods in the
pixel setting of Atari games suite.Comment: In Proceedings of the 29th International Conference on Automated
Planning and Scheduling (ICAPS 2019). arXiv admin note: text overlap with
arXiv:1806.0589
Model Learning for Look-ahead Exploration in Continuous Control
We propose an exploration method that incorporates look-ahead search over
basic learnt skills and their dynamics, and use it for reinforcement learning
(RL) of manipulation policies . Our skills are multi-goal policies learned in
isolation in simpler environments using existing multigoal RL formulations,
analogous to options or macroactions. Coarse skill dynamics, i.e., the state
transition caused by a (complete) skill execution, are learnt and are unrolled
forward during lookahead search. Policy search benefits from temporal
abstraction during exploration, though itself operates over low-level primitive
actions, and thus the resulting policies does not suffer from suboptimality and
inflexibility caused by coarse skill chaining. We show that the proposed
exploration strategy results in effective learning of complex manipulation
policies faster than current state-of-the-art RL methods, and converges to
better policies than methods that use options or parametrized skills as
building blocks of the policy itself, as opposed to guiding exploration. We
show that the proposed exploration strategy results in effective learning of
complex manipulation policies faster than current state-of-the-art RL methods,
and converges to better policies than methods that use options or parameterized
skills as building blocks of the policy itself, as opposed to guiding
exploration.Comment: This is a pre-print of our paper which is accepted in AAAI 201
Combined Reinforcement Learning via Abstract Representations
In the quest for efficient and robust reinforcement learning methods, both
model-free and model-based approaches offer advantages. In this paper we
propose a new way of explicitly bridging both approaches via a shared
low-dimensional learned encoding of the environment, meant to capture
summarizing abstractions. We show that the modularity brought by this approach
leads to good generalization while being computationally efficient, with
planning happening in a smaller latent state space. In addition, this approach
recovers a sufficient low-dimensional representation of the environment, which
opens up new strategies for interpretable AI, exploration and transfer
learning.Comment: Accepted to the Thirty-Third AAAI Conference On Artificial
Intelligence, 201
Deep learning for video game playing
In this article, we review recent Deep Learning advances in the context of
how they have been applied to play different types of video games such as
first-person shooters, arcade games, and real-time strategy games. We analyze
the unique requirements that different game genres pose to a deep learning
system and highlight important open challenges in the context of applying these
machine learning methods to video games, such as general game playing, dealing
with extremely large decision spaces and sparse rewards
- …