Current domain-independent, classical planners require symbolic models of the
problem domain and instance as input, resulting in a knowledge acquisition
bottleneck. Meanwhile, although deep learning has achieved significant success
in many fields, the knowledge is encoded in a subsymbolic representation which
is incompatible with symbolic systems such as planners. We propose LatPlan, an
unsupervised architecture combining deep learning and classical planning. Given
only an unlabeled set of image pairs showing a subset of transitions allowed in
the environment (training inputs), and a pair of images representing the
initial and the goal states (planning inputs), LatPlan finds a plan to the goal
state in a symbolic latent space and returns a visualized plan execution. The
contribution of this paper is twofold: (1) State Autoencoder, which finds a
propositional state representation of the environment using a Variational
Autoencoder. It generates a discrete latent vector from the images, based on
which a PDDL model can be constructed and then solved by an off-the-shelf
planner. (2) Action Autoencoder / Discriminator, a neural architecture which
jointly finds the action symbols and the implicit action models
(preconditions/effects), and provides a successor function for the implicit
graph search. We evaluate LatPlan using image-based versions of 3 planning
domains: 8-puzzle, Towers of Hanoi and LightsOut.Comment: This is an extended manuscript of the paper accepted in AAAI-18. The
contents of AAAI-18 paper itself is significantly extended from what has been
published in Arxiv or previous workshops. Over half of the paper describing
(2) is new. Additionally, this manuscript contains the supplemental materials
of AAAI-18 submissio