233 research outputs found

    Trial-based Heuristic Tree Search for MDPs with Factored Action Spaces

    Get PDF
    MDPs with factored action spaces, i.e. where actions are described as assignments to a set of action variables, allow reasoning over action variables instead of action states, yet most algorithms only consider a grounded action representation. This includes algorithms that are instantiations of the trial-based heuristic tree search (THTS) framework, such as AO* or UCT. To be able to reason over factored action spaces, we propose a generalisation of THTS where nodes that branch over all applicable actions are replaced with subtrees that consist of nodes that represent the decision for a single action variable. We show that many THTS algorithms retain their theoretical properties under the generalised framework, and show how to approximate any state-action heuristic to a heuristic for partial action assignments. This allows to guide a UCT variant that is able to create exponentially fewer nodes than the same algorithm that considers ground actions. An empirical evaluation on the benchmark set of the probabilistic track of the latest International Planning Competition validates the benefits of the approach

    Stochastic Planning with Lifted Symbolic Trajectory Optimization

    Get PDF
    This paper investigates online stochastic planning for problems with large factored state and action spaces. One promising approach in recent work estimates the quality of applicable actions in the current state through aggregate simulation from the states they reach. This leads to significant speedup, compared to search over concrete states and actions, and suffices to guide decision making in cases where the performance of a random policy is informative of the quality of a state. The paper makes two significant improvements to this approach. The first, taking inspiration from lifted belief propagation, exploits the structure of the problem to derive a more compact computation graph for aggregate simulation. The second improvement replaces the random policy embedded in the computation graph with symbolic variables that are optimized simultaneously with the search for high quality actions. This expands the scope of the approach to problems that require deep search and where information is lost quickly with random steps. An empirical evaluation shows that these ideas significantly improve performance, leading to state of the art performance on hard planning problems

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    Solving large stochastic planning problems using multiple dynamic abstractions

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 165-172).One of the goals of AI is to produce a computer system that can plan and act intelligently in the real world. It is difficult to do so, in part because real-world domains are very large. Existing research generally deals with the large domain size using a static representation and exploiting a single type of domain structure. This leads either to an inability to complete planning on larger domains or to poor solution quality because pertinent information is discarded. This thesis creates a framework that encapsulates existing and new abstraction and approximation methods into modules and combines arbitrary modules into a 'hierarchy that allows for dynamic representation changes. The combination of different abstraction methods allows many qualitatively different types of structure in the domain to be exploited simultaneously. The ability to change the representation dynamically allows the framework to take advantage of how different domain subparts are relevant in different ways at different times. Since the current plan tracks the current representation, choosing to simplify (or omit) distant or improbable areas of the domain sacrifices little in the way of solution quality while making the planning problem considerably easier.(cont.) The module hierarchy approach leads to greater abstraction that is tailored to the domain and therefore need not give up hope of creating reasonable solutions. While there are no optimality guarantees, experimental results show that suitable module choices gain computational tractability at little cost to behavioral optimality and allow the module hierarchy to solve larger and more interesting domains than previously possible.by Kurt Alan Steinkraus.Ph.D
    • …
    corecore