4,880 research outputs found

    Monte-Carlo tree search with heuristic knowledge: A novel way in solving capturing and life and death problems in Go

    Get PDF
    Monte-Carlo (MC) tree search is a new research field. Its effectiveness in searching large state spaces, such as the Go game tree, is well recognized in the computer Go community. Go domain- specific heuristics and techniques as well as domain-independent heuristics and techniques are sys- tematically investigated in the context of the MC tree search in this dissertation. The search extensions based on these heuristics and techniques can significantly improve the effectiveness and efficiency of the MC tree search. Two major areas of investigation are addressed in this dissertation research: I. The identification and use of the effective heuristic knowledge in guiding the MC simulations, II. The extension of the MC tree search algorithm with heuristics. Go, the most challenging board game to the machine, serves as the test bed. The effectiveness of the MC tree search extensions is demonstrated through the performances of Go tactic problem solvers using these techniques. The main contributions of this dissertation include: 1. A heuristics based Monte-Carlo tactic tree search framework is proposed to extend the standard Monte-Carlo tree search. 2. (Go) Knowledge based heuristics are systematically investigated to improve the Monte-Carlo tactic tree search. 3. Pattern learning is demonstrated as effective in improving the Monte-Carlo tactic tree search. 4. Domain knowledge independent tree search enhancements are shown as effective in improving the Monte-Carlo tactic tree search performances. 5. A strong Go Tactic solver based on proposed algorithms outperforms traditional game tree search algorithms. The techniques developed in this dissertation research can benefit other game domains and ap- plication fields

    Inverse Statistical Physics of Protein Sequences: A Key Issues Review

    Full text link
    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e.~evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.Comment: 18 pages, 7 figure

    Action Guidance with MCTS for Deep Reinforcement Learning

    Full text link
    Deep reinforcement learning has achieved great successes in recent years, however, one main challenge is the sample inefficiency. In this paper, we focus on how to use action guidance by means of a non-expert demonstrator to improve sample efficiency in a domain with sparse, delayed, and possibly deceptive rewards: the recently-proposed multi-agent benchmark of Pommerman. We propose a new framework where even a non-expert simulated demonstrator, e.g., planning algorithms such as Monte Carlo tree search with a small number rollouts, can be integrated within asynchronous distributed deep reinforcement learning methods. Compared to a vanilla deep RL algorithm, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE'19). arXiv admin note: substantial text overlap with arXiv:1904.05759, arXiv:1812.0004

    A Manifesto for the Equifinality Thesis.

    Get PDF
    This essay discusses some of the issues involved in the identification and predictions of hydrological models given some calibration data. The reasons for the incompleteness of traditional calibration methods are discussed. The argument is made that the potential for multiple acceptable models as representations of hydrological and other environmental systems (the equifinality thesis) should be given more serious consideration than hitherto. It proposes some techniques for an extended GLUE methodology to make it more rigorous and outlines some of the research issues still to be resolved

    Certified Reinforcement Learning with Logic Guidance

    Full text link
    This paper proposes the first model-free Reinforcement Learning (RL) framework to synthesise policies for unknown, and continuous-state Markov Decision Processes (MDPs), such that a given linear temporal property is satisfied. We convert the given property into a Limit Deterministic Buchi Automaton (LDBA), namely a finite-state machine expressing the property. Exploiting the structure of the LDBA, we shape a synchronous reward function on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces that probabilistically satisfy the linear temporal property. This probability (certificate) is also calculated in parallel with policy learning when the state space of the MDP is finite: as such, the RL algorithm produces a policy that is certified with respect to the property. Under the assumption of finite state space, theoretical guarantees are provided on the convergence of the RL algorithm to an optimal policy, maximising the above probability. We also show that our method produces ''best available'' control policies when the logical property cannot be satisfied. In the general case of a continuous state space, we propose a neural network architecture for RL and we empirically show that the algorithm finds satisfying policies, if there exist such policies. The performance of the proposed framework is evaluated via a set of numerical examples and benchmarks, where we observe an improvement of one order of magnitude in the number of iterations required for the policy synthesis, compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
    • 

    corecore