83 research outputs found

    Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic

    Full text link
    We consider the problem of using a heuristic policy to improve the value approximation by the Upper Confidence Bound applied in Trees (UCT) algorithm in non-adversarial settings such as planning with large-state space Markov Decision Processes. Current improvements to UCT focus on either changing the action selection formula at the internal nodes or the rollout policy at the leaf nodes of the search tree. In this work, we propose to add an auxiliary arm to each of the internal nodes, and always use the heuristic policy to roll out simulations at the auxiliary arms. The method aims to get fast convergence to optimal values at states where the heuristic policy is optimal, while retaining similar approximation as the original UCT in other states. We show that bootstrapping with the proposed method in the new algorithm, UCT-Aux, performs better compared to the original UCT algorithm and its variants in two benchmark experiment settings. We also examine conditions under which UCT-Aux works well.Comment: 16 pages, accepted for presentation at ECML'1

    Practical Open-Loop Optimistic Planning

    Get PDF
    We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies - i.e. sequences of actions - and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KLOLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms

    Noisy Optimization: Convergence with a Fixed Number of Resamplings

    Get PDF
    It is known that evolution strategies in continuous domains might not converge in the presence of noise. It is also known that, under mild assumptions, and using an increasing number of resamplings, one can mitigate the effect of additive noise and recover convergence. We show new sufficient conditions for the convergence of an evolutionary algorithm with constant number of resamplings; in particular, we get fast rates (log-linear convergence) provided that the variance decreases around the optimum slightly faster than in the so-called multiplicative noise model. Keywords: Noisy optimization, evolutionary algorithm, theory.Comment: EvoStar (2014

    A network-based dynamical ranking system for competitive sports

    Full text link
    From the viewpoint of networks, a ranking system for players or teams in sports is equivalent to a centrality measure for sports networks, whereby a directed link represents the result of a single game. Previously proposed network-based ranking systems are derived from static networks, i.e., aggregation of the results of games over time. However, the score of a player (or team) fluctuates over time. Defeating a renowned player in the peak performance is intuitively more rewarding than defeating the same player in other periods. To account for this factor, we propose a dynamic variant of such a network-based ranking system and apply it to professional men's tennis data. We derive a set of linear online update equations for the score of each player. The proposed ranking system predicts the outcome of the future games with a higher accuracy than the static counterparts.Comment: 6 figure

    Warm-Start AlphaZero Self-Play Search Enhancements

    Get PDF
    Recently, AlphaZero has achieved landmark results in deep reinforcement learning, by providing a single self-play architecture that learned three different games at super human level. AlphaZero is a large and complicated system with many parameters, and success requires much compute power and fine-tuning. Reproducing results in other games is a challenge, and many researchers are looking for ways to improve results while reducing computational demands. AlphaZero's design is purely based on self-play and makes no use of labeled expert data ordomain specific enhancements; it is designed to learn from scratch. We propose a novel approach to deal with this cold-start problem by employing simple search enhancements at the beginning phase of self-play training, namely Rollout, Rapid Action Value Estimate (RAVE) and dynamically weighted combinations of these with the neural network, and Rolling Horizon Evolutionary Algorithms (RHEA). Our experiments indicate that most of these enhancements improve the performance of their baseline player in three different (small) board games, with especially RAVE based variants playing strongly

    A Model of Oxidative Stress Management: Moderation of Carbohydrate Metabolizing Enzymes in SOD1-Null Drosophila melanogaster

    Get PDF
    The response to oxidative stress involves numerous genes and mutations in these genes often manifest in pleiotropic ways that presumably reflect perturbations in ROS-mediated physiology. The Drosophila melanogaster SOD1-null allele (cSODn108) is proposed to result in oxidative stress by preventing superoxide breakdown. In SOD1-null flies, oxidative stress management is thought to be reliant on the glutathione-dependent antioxidants that utilize NADPH to cycle between reduced and oxidized form. Previous studies suggest that SOD1-null Drosophila rely on lipid catabolism for energy rather than carbohydrate metabolism. We tested these connections by comparing the activity of carbohydrate metabolizing enzymes, lipid and triglyceride concentration, and steady state NADPH:NADP+ in SOD1-null and control transgenic rescue flies. We find a negative shift in the activity of carbohydrate metabolizing enzymes in SOD1-nulls and the NADP+-reducing enzymes were found to have significantly lower activity than the other enzymes assayed. Little evidence for the catabolism of lipids as preferential energy source was found, as the concentration of lipids and triglycerides were not significantly lower in SOD1-nulls compared with controls. Using a starvation assay to impact lipids and triglycerides, we found that lipids were indeed depleted in both genotypes when under starvation stress, suggesting that oxidative damage was not preventing the catabolism of lipids in SOD1-null flies. Remarkably, SOD1-nulls were also found to be relatively resistant to starvation. Age profiles of enzyme activity, triglyceride and lipid concentration indicates that the trends observed are consistent over the average lifespan of the SOD1-nulls. Based on our results, we propose a model of physiological response in which organisms under oxidative stress limit the production of ROS through the down-regulation of carbohydrate metabolism in order to moderate the products exiting the electron transport chain

    A Parallel Monte-Carlo Tree Search Algorithm

    No full text
    Abstract. Monte-Carlo tree search is a powerful paradigm for the game of Go. We present a parallel Master-Slave algorithm for Monte-Carlo tree search. We experimented the algorithm on a network of computers using various configura-tions: from 12,500 to 100,000 playouts, from 1 to 64 slaves, and from 1 to 16 computers. On our architecture we obtain a speedup of 14 for 16 slaves. With a single slave and five seconds per move our algorithm scores 40.5 % against GNUGO, with sixteen slaves and five seconds per move it scores 70.5%. We also give the potential speedups of our algorithm for various playout times.

    On Semeai Detection in Monte-Carlo Go

    No full text
    corecore