Search CORE

28 research outputs found

High-Accuracy Value-Function Approximation with Neural Networks Applied to the Acrobot

Author: Coulom Rémi
Publication venue: d-side
Publication date: 01/01/2004
Field of study

Colloque avec actes et comité de lecture. internationale.International audienceSeveral reinforcement-learning techniques have already been applied to the Acrobot control problem, using linear function approximators to estimate the value function. In this paper, we present experimental results obtained by using a feedforward neural network instead. The learning algorithm used was model-based continuous TD(lambda). It generated an efficient controller, producing a high-accuracy state-value function. A striking feature of this value function is a very sharp 4-dimensional ridge that is extremely hard to evaluate with linear parametric approximators. From a broader point of view, this experimental success demonstrates some of the qualities of feedforward neural networks in comparison with linear approximators in reinforcement learning

INRIA a CCSD electronic archive server

HAL-Rennes 1

Computing Elo Ratings of Move Patterns in the Game of Go

Author: Coulom Rémi
Publication venue: HAL CCSD
Publication date: 15/06/2007
Field of study

Move patterns are an essential method to incorporate domain knowledge into Go-playing programs. This paper presents a new Bayesian technique for supervised learning of such patterns from game records, based on a generalization of Elo ratings. Each sample move in the training data is considered as a victory of a team of pattern features. Elo ratings of individual pattern features are computed from these victories, and can be used in previously unseen positions to compute a probability distribution over legal moves. In this approach, several pattern features may be combined, without an exponential cost in the number of features. Despite a very small number of training games (652), this algorithm outperforms most previous pattern-learning algorithms, both in terms of mean log-evidence (−2.69), and prediction rate (34.9%). A 19x19 Monte-Carlo program improved with these patterns reached the level of the strongest classical programs

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Handling Expensive Optimization with Large Noise

Author: Coulom Rémi
Rolet Philippe
Sokolovska Nataliya
Teytaud Olivier
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceThis paper exhibits lower and upper bounds on runtimes for expensive noisy optimization problems. Runtimes are expressed in terms of number of fitness evaluations. Fitnesses considered are monotonic transformations of the {\em sphere} function. The analysis focuses on the common case of fitness functions quadratic in the distance to the optimum in the neighborhood of this optimum---it is nonetheless also valid for any monotonic polynomial of degree p>2. Upper bounds are derived via a bandit-based estimation of distribution algorithm that relies on Bernstein races called R-EDA. It is known that the algorithm is consistent even in non-differentiable cases. Here we show that: (i) if the variance of the noise decreases to 0 around the optimum, it can perform optimally for quadratic transformations of the norm to the optimum, (ii) otherwise, it provides a slower convergence rate than the one exhibited empirically by an algorithm called Quadratic Logistic Regression based on surrogate models---although QLR requires a probabilistic prior on the fitness class

HAL-CentraleSupelec

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Equi-Gradient Temporal Difference Learning

Author: Coulom Rémi
Davy Manuel
Loth Manuel
Preux Philippe
Publication venue: HAL CCSD
Publication date: 29/06/2006
Field of study

Equi-Gradient Temporal Difference Learnin

HAL - Lille 3

INRIA a CCSD electronic archive server

High-accuracy value-function approximation with neural networks

Author: Rémi Coulom
Publication venue
Publication date: 01/01/2004
Field of study

Abstract. Several reinforcement-learning techniques have already been applied to the Acrobot control problem, using linear function approximators to estimate the value function. In this paper, we present experimental results obtained by using a feedforward neural network instead. The learning algorithm used was model-based continuous TD(λ). It generated an efficient controller, producing a high-accuracy state-value function. A striking feature of this value function is a very sharp 4-dimensional ridge that is extremely hard to evaluate with linear parametric approximators. From a broader point of view, this experimental success demonstrates some of the qualities of feedforward neural networks in comparison with linear approximators in reinforcement learning.

CiteSeerX

Le jeu de go et la révolution de Monte Carlo

Author: Coulom Rémi
Publication venue: INRIA
Publication date: 28/04/2009
Field of study

National audienceUne révolution technologique a permis aux ordinateurs de faire un pas en avant : les méthodes dites « de Monte Carlo ». Quels sont les principes de ces algorithmes, et comment s’appliquent-ils au go

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Le jeu de go et la révolution de Monte Carlo

Author: Coulom Rémi
Publication venue: INRIA
Publication date: 28/04/2009
Field of study

Hal-Diderot

Monte-Carlo Tree Search in Crazy Stone

Author: Coulom Rémi
Publication venue: HAL CCSD
Publication date: 09/11/2007
Field of study

International audienceMonte-Carlo tree search has recently revolutionized Go programming. Even on the large 19x19 board, the strongest Monte-Carlo programs are now stronger than the strongest classical programs. This talk is a summary of the principle of Monte-Carlo tree search, and a description of Crazy Stone, one of those new artificial players

HAL - Lille 3

INRIA a CCSD electronic archive server