3 research outputs found
Deep Policies for Width-Based Planning in Pixel Domains
Width-based planning has demonstrated great success in recent years due to
its ability to scale independently of the size of the state space. For example,
Bandres et al. (2018) introduced a rollout version of the Iterated Width
algorithm whose performance compares well with humans and learning methods in
the pixel setting of the Atari games suite. In this setting, planning is done
on-line using the "screen" states and selecting actions by looking ahead into
the future. However, this algorithm is purely exploratory and does not leverage
past reward information. Furthermore, it requires the state to be factored into
features that need to be pre-defined for the particular task, e.g., the B-PROST
pixel features. In this work, we extend width-based planning by incorporating
an explicit policy in the action selection mechanism. Our method, called
-IW, interleaves width-based planning and policy learning using the
state-actions visited by the planner. The policy estimate takes the form of a
neural network and is in turn used to guide the planning step, thus reinforcing
promising paths. Surprisingly, we observe that the representation learned by
the neural network can be used as a feature space for the width-based planner
without degrading its performance, thus removing the requirement of pre-defined
features for the planner. We compare -IW with previous width-based methods
and with AlphaZero, a method that also interleaves planning and learning, in
simple environments, and show that -IW has superior performance. We also
show that -IW algorithm outperforms previous width-based methods in the
pixel setting of Atari games suite.Comment: In Proceedings of the 29th International Conference on Automated
Planning and Scheduling (ICAPS 2019). arXiv admin note: text overlap with
arXiv:1806.0589
Generalized Planning as Heuristic Search: A new planning search-space that leverages pointers over objects
Planning as heuristic search is one of the most successful approaches to
classical planning but unfortunately, it does not extend trivially to
Generalized Planning (GP). GP aims to compute algorithmic solutions that are
valid for a set of classical planning instances from a given domain, even if
these instances differ in the number of objects, the number of state variables,
their domain size, or their initial and goal configuration. The generalization
requirements of GP make it impractical to perform the state-space search that
is usually implemented by heuristic planners. This paper adapts the planning as
heuristic search paradigm to the generalization requirements of GP, and
presents the first native heuristic search approach to GP. First, the paper
introduces a new pointer-based solution space for GP that is independent of the
number of classical planning instances in a GP problem and the size of those
instances (i.e. the number of objects, state variables and their domain sizes).
Second, the paper defines a set of evaluation and heuristic functions for
guiding a combinatorial search in our new GP solution space. The computation of
these evaluation and heuristic functions does not require grounding states or
actions in advance. Therefore our GP as heuristic search approach can handle
large sets of state variables with large numerical domains, e.g.~integers.
Lastly, the paper defines an upgraded version of our novel algorithm for GP
called Best-First Generalized Planning (BFGP), that implements a best-first
search in our pointer-based solution space, and that is guided by our
evaluation/heuristic functions for GP.Comment: Under review in the Artificial Intelligence Journal (AIJ