9,950 research outputs found
Learning Generalized Reactive Policies using Deep Neural Networks
We present a new approach to learning for planning, where knowledge acquired
while solving a given set of planning problems is used to plan faster in
related, but new problem instances. We show that a deep neural network can be
used to learn and represent a \emph{generalized reactive policy} (GRP) that
maps a problem instance and a state to an action, and that the learned GRPs
efficiently solve large classes of challenging problem instances. In contrast
to prior efforts in this direction, our approach significantly reduces the
dependence of learning on handcrafted domain knowledge or feature selection.
Instead, the GRP is trained from scratch using a set of successful execution
traces. We show that our approach can also be used to automatically learn a
heuristic function that can be used in directed search algorithms. We evaluate
our approach using an extensive suite of experiments on two challenging
planning problem domains and show that our approach facilitates learning
complex decision making policies and powerful heuristic functions with minimal
human input. Videos of our results are available at goo.gl/Hpy4e3
Marvin: A Heuristic Search Planner with Online Macro-Action Learning
This paper describes Marvin, a planner that competed in the Fourth
International Planning Competition (IPC 4). Marvin uses
action-sequence-memoisation techniques to generate macro-actions, which are
then used during search for a solution plan. We provide an overview of its
architecture and search behaviour, detailing the algorithms used. We also
empirically demonstrate the effectiveness of its features in various planning
domains; in particular, the effects on performance due to the use of
macro-actions, the novel features of its search behaviour, and the native
support of ADL and Derived Predicates
Anytime Point-Based Approximations for Large POMDPs
The Partially Observable Markov Decision Process has long been recognized as
a rich framework for real-world planning and control problems, especially in
robotics. However exact solutions in this framework are typically
computationally intractable for all but the smallest problems. A well-known
technique for speeding up POMDP solving involves performing value backups at
specific belief points, rather than over the entire belief simplex. The
efficiency of this approach, however, depends greatly on the selection of
points. This paper presents a set of novel techniques for selecting informative
belief points which work well in practice. The point selection procedure is
combined with point-based value backups to form an effective anytime POMDP
algorithm called Point-Based Value Iteration (PBVI). The first aim of this
paper is to introduce this algorithm and present a theoretical analysis
justifying the choice of belief selection technique. The second aim of this
paper is to provide a thorough empirical comparison between PBVI and other
state-of-the-art POMDP methods, in particular the Perseus algorithm, in an
effort to highlight their similarities and differences. Evaluation is performed
using both standard POMDP domains and realistic robotic tasks
The GRT Planning System: Backward Heuristic Construction in Forward State-Space Planning
This paper presents GRT, a domain-independent heuristic planning system for
STRIPS worlds. GRT solves problems in two phases. In the pre-processing phase,
it estimates the distance between each fact and the goals of the problem, in a
backward direction. Then, in the search phase, these estimates are used in
order to further estimate the distance between each intermediate state and the
goals, guiding so the search process in a forward direction and on a best-first
basis. The paper presents the benefits from the adoption of opposite directions
between the preprocessing and the search phases, discusses some difficulties
that arise in the pre-processing phase and introduces techniques to cope with
them. Moreover, it presents several methods of improving the efficiency of the
heuristic, by enriching the representation and by reducing the size of the
problem. Finally, a method of overcoming local optimal states, based on domain
axioms, is proposed. According to it, difficult problems are decomposed into
easier sub-problems that have to be solved sequentially. The performance
results from various domains, including those of the recent planning
competitions, show that GRT is among the fastest planners
- …