Search CORE

83 research outputs found

Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic

Author: G. Chaslot
L. Kocsis
M. Kearns
P. Auer
R. Bellman
R. Coulom
S. Gelly
Publication venue
Publication date: 01/01/2012
Field of study

We consider the problem of using a heuristic policy to improve the value approximation by the Upper Confidence Bound applied in Trees (UCT) algorithm in non-adversarial settings such as planning with large-state space Markov Decision Processes. Current improvements to UCT focus on either changing the action selection formula at the internal nodes or the rollout policy at the leaf nodes of the search tree. In this work, we propose to add an auxiliary arm to each of the internal nodes, and always use the heuristic policy to roll out simulations at the auxiliary arms. The method aims to get fast convergence to optimal values at states where the heuristic policy is optimal, while retaining similar approximation as the original UCT in other states. We show that bootstrapping with the proposed method in the new algorithm, UCT-Aux, performs better compared to the original UCT algorithm and its variants in two benchmark experiment settings. We also examine conditions under which UCT-Aux works well.Comment: 16 pages, accepted for presentation at ECML'1

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

ScholarBank@NUS

Practical Open-Loop Optimistic Planning

Author: D Silver
D Silver
D Silver
J-F Hren
L Buşoniu
O Cappé
R Bellman
R Coulom
Publication venue
Publication date: 09/04/2019
Field of study

We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies - i.e. sequences of actions - and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KLOLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Noisy Optimization: Convergence with a Fixed Number of Resamplings

Author: A Auger
D Arnold
DV Arnold
H Chen
H-G Beyer
I Rechenberg
O Teytaud
R Coulom
V Fabian
Publication venue
Publication date: 09/04/2014
Field of study

It is known that evolution strategies in continuous domains might not converge in the presence of noise. It is also known that, under mild assumptions, and using an increasing number of resamplings, one can mitigate the effect of additive noise and recover convergence. We show new sufficient conditions for the convergence of an evolutionary algorithm with constant number of resamplings; in particular, we get fast rates (log-linear convergence) provided that the variance decreases around the optimum slightly faster than in the so-called multiplicative noise model. Keywords: Noisy optimization, evolutionary algorithm, theory.Comment: EvoStar (2014

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

A network-based dynamical ranking system for competitive sports

Author: BJ Coleman
F Radicchi
HE Daniels
J Martinich
JW Moon
L Fahrmeir
L Knorr-Held
ME Glickman
MG Kendall
MJ Dixon
NE Borm
P Grindrod
P Holme
PJJ Herings
R Coulom
R Fagin
R Herbrich
RA Bradley
RK Pan
RT Stefani
S Saavedra
T Callaghan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/12/2012
Field of study

From the viewpoint of networks, a ranking system for players or teams in sports is equivalent to a centrality measure for sports networks, whereby a directed link represents the result of a single game. Previously proposed network-based ranking systems are derived from static networks, i.e., aggregation of the results of games over time. However, the score of a player (or team) fluctuates over time. Defeating a renowned player in the peak performance is intuitively more rewarding than defeating the same player in other periods. To account for this factor, we propose a dynamic variant of such a network-based ranking system and apply it to professional men's tennis data. We derive a set of linear online update equations for the score of each player. The proposed ranking system predicts the outcome of the future games with a higher accuracy than the static counterparts.Comment: 6 figure

arXiv.org e-Print Archive

Crossref

Warm-Start AlphaZero Self-Play Search Enhancements

Author: C Browne
CD Rosin
D Silver
D Silver
D Silver
EA Heinz
G Tesauro
H Wang
J Schmidhuber
J Tao
LV Allis
M Buro
MA Wiering
ML Zhang
N Justesen
N Srivastava
O Vinyals
R Coulom
R Coulom
RD Gaina
S Gelly
S Iwata
S Reisch
SY Chong
TP Runarsson
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/04/2020
Field of study

Recently, AlphaZero has achieved landmark results in deep reinforcement learning, by providing a single self-play architecture that learned three different games at super human level. AlphaZero is a large and complicated system with many parameters, and success requires much compute power and fine-tuning. Reproducing results in other games is a challenge, and many researchers are looking for ways to improve results while reducing computational demands. AlphaZero's design is purely based on self-play and makes no use of labeled expert data ordomain specific enhancements; it is designed to learn from scratch. We propose a novel approach to deal with this cold-start problem by employing simple search enhancements at the beginning phase of self-play training, namely Rollout, Rapid Action Value Estimate (RAVE) and dynamically weighted combinations of these with the neural network, and Rolling Horizon Evolutionary Algorithms (RHEA). Our experiments indicate that most of these enhancements improve the performance of their baseline player in three different (small) board games, with especially RAVE based variants playing strongly

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications

Temporal-difference search in computer Go

Author: A. Elo
B. Widrow
D. Silver
D. Silver
D. Stern
David Silver
E. Werf van der
F. Dahl
G. Chaslot
G. Tesauro
G. Tesauro
H. Finnsson
H. Mayer
J. Baxter
J. Fürnkranz
J. Schaeffer
J. Schaeffer
J. Schaeffer
J. Tsitsiklis
J. Veness
L. Kocsis
M. Buro
M. Enzenberger
M. Littman
M. Müller
M. Winands
Martin Müller
N. Schraudolph
N. Sturtevant
P. Auer
P. Dayan
R. Balla
R. Coulom
R. Coulom
R. Coulom
R. Lorentz
R. Sutton
R. Sutton
R. Sutton
R. Sutton
Richard S. Sutton
S. Gelly
S. Gelly
S. Haykin
S. Hochreiter
S. Huang
S. Singh
S. Singh
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Model of Oxidative Stress Management: Moderation of Carbohydrate Metabolizing Enzymes in SOD1-Null Drosophila melanogaster

Author: A Chaudhuri
A Duttaroy
A Paul
B Rogina
BJ Turner
BR Strub
BW Geer
BW Geer
C Curtis
CJ Vermeulen
DJ Clancy
DR Cavener
DR Rosen
EM Wise Jr
F Missirlis
FL Muller
GC Brown
H Bauer
H Coulom
H Juhnke
H Kabil
H Ruan
HX Deng
J Park
JP Phillips
K Kirby
KH Slekar
Kristine E. Bernard
L Chen
L Harshman
LB Becker
M Ralser
M Ristow
M Zhan
MT Marron
N Piazza
NS Dhalla
Pedro Lagerblad Oliveira
PP Pandolfi
PY Wang
R Singh
R Singh
RC Grandison
RJ Mailloux
RS Sohal
RS Sohal
S Rion
S Wicks
S Zou
SD Campbell
SK Legan
SM Kanzok
T Tsuzuki
Thomas J. S. Merritt
TJ Merritt
TJ Merritt
TJ Merritt
TL Parkes
TL Parkes
Tony L. Parkes
W Ying
WF Eanes
WH Tong
WW Ja
Publication venue: Public Library of Science
Publication date
Field of study

The response to oxidative stress involves numerous genes and mutations in these genes often manifest in pleiotropic ways that presumably reflect perturbations in ROS-mediated physiology. The Drosophila melanogaster SOD1-null allele (cSODn108) is proposed to result in oxidative stress by preventing superoxide breakdown. In SOD1-null flies, oxidative stress management is thought to be reliant on the glutathione-dependent antioxidants that utilize NADPH to cycle between reduced and oxidized form. Previous studies suggest that SOD1-null Drosophila rely on lipid catabolism for energy rather than carbohydrate metabolism. We tested these connections by comparing the activity of carbohydrate metabolizing enzymes, lipid and triglyceride concentration, and steady state NADPH:NADP+ in SOD1-null and control transgenic rescue flies. We find a negative shift in the activity of carbohydrate metabolizing enzymes in SOD1-nulls and the NADP+-reducing enzymes were found to have significantly lower activity than the other enzymes assayed. Little evidence for the catabolism of lipids as preferential energy source was found, as the concentration of lipids and triglycerides were not significantly lower in SOD1-nulls compared with controls. Using a starvation assay to impact lipids and triglycerides, we found that lipids were indeed depleted in both genotypes when under starvation stress, suggesting that oxidative damage was not preventing the catabolism of lipids in SOD1-null flies. Remarkably, SOD1-nulls were also found to be relatively resistant to starvation. Age profiles of enzyme activity, triglyceride and lipid concentration indicates that the trends observed are consistent over the average lifespan of the SOD1-nulls. Based on our results, we propose a model of physiological response in which organisms under oxidative stress limit the production of ROS through the down-regulation of carbohydrate metabolism in order to moderate the products exiting the electron transport chain

Crossref

Directory of Open Access Journals

PubMed Central

Diquat causes caspase-independent cell death in SH-SY5Y cells by production of ROS independently of mitochondria

Crossref

A Parallel Monte-Carlo Tree Search Algorithm

Author: L. Kocsis
M. Campbell
R. Coulom
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Abstract. Monte-Carlo tree search is a powerful paradigm for the game of Go. We present a parallel Master-Slave algorithm for Monte-Carlo tree search. We experimented the algorithm on a network of computers using various configura-tions: from 12,500 to 100,000 playouts, from 1 to 64 slaves, and from 1 to 16 computers. On our architecture we obtain a speedup of 14 for 16 slaves. With a single slave and five seconds per move our algorithm scores 40.5 % against GNUGO, with sixteen slaves and five seconds per move it scores 70.5%. We also give the potential speedups of our algorithm for various playout times.

CiteSeerX

Crossref

On Semeai Detection in Monte-Carlo Go

Author: A Rimmel
BW Silverman
K Fukunaga
R Coulom
R Coulom
R Coulom
S Pellegrino
S-C Huang
S-C Huang
Y Cheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref