We present a new paradigm for planning by learning, where the planner is given a model of the world and a small set of states of interest, but no indication of optimal actions in these states. The additional information can help focus the planner on regions of the state space that are of interest and lead to improved performance. We demonstrate this idea by introducing novel model-checking reduction operations for First Order Decision Diagrams (FODD), a representation that has been used to implement decision-theoretic planning with Relational Markov Decision Processes (RMDP). Intuitively, these reductions modify the construction of the value function by removing any complex specifications that are irrelevant to the set of training examples, thereby focusing on the region of interest. We show that such training examples can be constructed on the fly from a description of the planning problem thus we can bootstrap to get a self-taught planning system. Additionally, we provide a new heuristic to embed universal and conjunctive goals within the framework of RMDP planners, expanding the scope and applicability of such systems. We show that these ideas lead to significant improvements in performance in terms of both speed and coverage of the planner, yielding state of the art planning performance on problems from the International Planning Competition
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.