40,803 research outputs found

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    Extending the use of plateau-escaping macro-actions in planning

    Get PDF
    Many fully automated planning systems use a single, domain independent heuristic to guide search and no other problem specific guidance. While these systems exhibit excellent performance, they are often out-performed by systems which are either given extra human-encoded search information, or spend time learning additional search control information offline. The benefit of systems which do not require human intervention is that they are much closer to the ideal of autonomy. This document discusses a system which learns additional control knowledge, in the form of macro-actions, during planning, without the additional time required for an online learning step. The results of various techniques for managing the collection of macro-actions generated are also discussed. Finally, an explanation of the extension of the techniques to other planning systems is presented

    Survey of dynamic scheduling in manufacturing systems

    Get PDF

    User interface issues in supporting human-computer integrated scheduling

    Get PDF
    Explored here is the user interface problems encountered with the Operations Missions Planner (OMP) project at the Jet Propulsion Laboratory (JPL). OMP uses a unique iterative approach to planning that places additional requirements on the user interface, particularly to support system development and maintenance. These requirements are necessary to support the concepts of heuristically controlled search, in-progress assessment, and iterative refinement of the schedule. The techniques used to address the OMP interface needs are given

    Decision-theoretic control of EUVE telescope scheduling

    Get PDF
    This paper describes a decision theoretic scheduler (DTS) designed to employ state-of-the-art probabilistic inference technology to speed the search for efficient solutions to constraint-satisfaction problems. Our approach involves assessing the performance of heuristic control strategies that are normally hard-coded into scheduling systems and using probabilistic inference to aggregate this information in light of the features of a given problem. The Bayesian Problem-Solver (BPS) introduced a similar approach to solving single agent and adversarial graph search patterns yielding orders-of-magnitude improvement over traditional techniques. Initial efforts suggest that similar improvements will be realizable when applied to typical constraint-satisfaction scheduling problems

    Learning Generalized Reactive Policies using Deep Neural Networks

    Full text link
    We present a new approach to learning for planning, where knowledge acquired while solving a given set of planning problems is used to plan faster in related, but new problem instances. We show that a deep neural network can be used to learn and represent a \emph{generalized reactive policy} (GRP) that maps a problem instance and a state to an action, and that the learned GRPs efficiently solve large classes of challenging problem instances. In contrast to prior efforts in this direction, our approach significantly reduces the dependence of learning on handcrafted domain knowledge or feature selection. Instead, the GRP is trained from scratch using a set of successful execution traces. We show that our approach can also be used to automatically learn a heuristic function that can be used in directed search algorithms. We evaluate our approach using an extensive suite of experiments on two challenging planning problem domains and show that our approach facilitates learning complex decision making policies and powerful heuristic functions with minimal human input. Videos of our results are available at goo.gl/Hpy4e3
    • ā€¦
    corecore