28 research outputs found

    High-Accuracy Value-Function Approximation with Neural Networks Applied to the Acrobot

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceSeveral reinforcement-learning techniques have already been applied to the Acrobot control problem, using linear function approximators to estimate the value function. In this paper, we present experimental results obtained by using a feedforward neural network instead. The learning algorithm used was model-based continuous TD(lambda). It generated an efficient controller, producing a high-accuracy state-value function. A striking feature of this value function is a very sharp 4-dimensional ridge that is extremely hard to evaluate with linear parametric approximators. From a broader point of view, this experimental success demonstrates some of the qualities of feedforward neural networks in comparison with linear approximators in reinforcement learning

    Computing Elo Ratings of Move Patterns in the Game of Go

    Get PDF
    Move patterns are an essential method to incorporate domain knowledge into Go-playing programs. This paper presents a new Bayesian technique for supervised learning of such patterns from game records, based on a generalization of Elo ratings. Each sample move in the training data is considered as a victory of a team of pattern features. Elo ratings of individual pattern features are computed from these victories, and can be used in previously unseen positions to compute a probability distribution over legal moves. In this approach, several pattern features may be combined, without an exponential cost in the number of features. Despite a very small number of training games (652), this algorithm outperforms most previous pattern-learning algorithms, both in terms of mean log-evidence (−2.69), and prediction rate (34.9%). A 19x19 Monte-Carlo program improved with these patterns reached the level of the strongest classical programs

    Handling Expensive Optimization with Large Noise

    Get PDF
    International audienceThis paper exhibits lower and upper bounds on runtimes for expensive noisy optimization problems. Runtimes are expressed in terms of number of fitness evaluations. Fitnesses considered are monotonic transformations of the {\em sphere} function. The analysis focuses on the common case of fitness functions quadratic in the distance to the optimum in the neighborhood of this optimum---it is nonetheless also valid for any monotonic polynomial of degree p>2. Upper bounds are derived via a bandit-based estimation of distribution algorithm that relies on Bernstein races called R-EDA. It is known that the algorithm is consistent even in non-differentiable cases. Here we show that: (i) if the variance of the noise decreases to 0 around the optimum, it can perform optimally for quadratic transformations of the norm to the optimum, (ii) otherwise, it provides a slower convergence rate than the one exhibited empirically by an algorithm called Quadratic Logistic Regression based on surrogate models---although QLR requires a probabilistic prior on the fitness class

    Equi-Gradient Temporal Difference Learning

    Get PDF
    Equi-Gradient Temporal Difference Learnin

    High-accuracy value-function approximation with neural networks

    No full text
    Abstract. Several reinforcement-learning techniques have already been applied to the Acrobot control problem, using linear function approximators to estimate the value function. In this paper, we present experimental results obtained by using a feedforward neural network instead. The learning algorithm used was model-based continuous TD(λ). It generated an efficient controller, producing a high-accuracy state-value function. A striking feature of this value function is a very sharp 4-dimensional ridge that is extremely hard to evaluate with linear parametric approximators. From a broader point of view, this experimental success demonstrates some of the qualities of feedforward neural networks in comparison with linear approximators in reinforcement learning.

    Le jeu de go et la révolution de Monte Carlo

    No full text
    National audienceUne rĂ©volution technologique a permis aux ordinateurs de faire un pas en avant : les mĂ©thodes dites « de Monte Carlo ». Quels sont les principes de ces algorithmes, et comment s’appliquent-ils au go

    Le jeu de go et la révolution de Monte Carlo

    No full text
    National audienceUne rĂ©volution technologique a permis aux ordinateurs de faire un pas en avant : les mĂ©thodes dites « de Monte Carlo ». Quels sont les principes de ces algorithmes, et comment s’appliquent-ils au go

    Monte-Carlo Tree Search in Crazy Stone

    No full text
    International audienceMonte-Carlo tree search has recently revolutionized Go programming. Even on the large 19x19 board, the strongest Monte-Carlo programs are now stronger than the strongest classical programs. This talk is a summary of the principle of Monte-Carlo tree search, and a description of Crazy Stone, one of those new artificial players
    corecore