85,991 research outputs found

    Model-free reinforcement learning for stochastic parity games

    Get PDF
    This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 112-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions

    Diffusive limit approximation of pure jump optimal ergodic control problems

    Full text link
    Motivated by the design of fast reinforcement learning algorithms, we study the diffusive limit of a class of pure jump ergodic stochastic control problems. We show that, whenever the intensity of jumps is large enough, the approximation error is governed by the H{\"o}lder continuity of the Hessian matrix of the solution to the limit ergodic partial differential equation. This extends to this context the results of [1] obtained for finite horizon problems. We also explain how to construct a first order error correction term under appropriate smoothness assumptions. Finally, we quantify the error induced by the use of the Markov control policy constructed from the numerical finite difference scheme associated to the limit diffusive problem, this seems to be new in the literature and of its own interest. This approach permits to reduce very significantly the numerical resolution cost

    Relevant Representations for the Inference of Rational Stochastic Tree Languages

    Get PDF
    International audienceRecently, an algorithm, DEES, was proposed for learning rational stochastic tree languages. Given an independantly and identically distributed sample of trees, drawn according to a rational stochastic language, DEES outputs a linear representation of a rational series which converges to the target. DEES can then be used to identify in the limit with probability one rational stochastic tree languages. However, when DEES deals with finite samples, it often outputs a rational tree series which does not define a stochastic language. Moreover, the linear representation can not be directly used as a generative model. In this paper, we show that any representation of a rational stochastic tree language can be transformed in a reduced normalised representation that can be used to generate trees from the underlying distribution. We also study some properties of consistency for rational stochastic tree languages and discuss their implication for the inference. We finally consider the applicability of DEES to trees built over an unranked alphabet

    Slow stochastic Hebbian learning of classes of stimuli in a recurrent neural network

    Get PDF
    We study unsupervised Hebbian learning in a recurrent network in which synapses have a finite number of stable states. Stimuli received by the network are drawn at random at each presentation from a set of classes. Each class is defined as a cluster in stimulus space, centred on the class prototype. The presentation protocol is chosen to mimic the protocols of visual memory experiments in which a set of stimuli is presented repeatedly in a random way. The statistics of the input stream may be stationary, or changing. Each stimulus induces, in a stochastic way, transitions between stable synaptic states. Learning dynamics is studied analytically in the slow learning limit, in which a given stimulus has to be presented many times before it is memorized, i.e. before synaptic modifications enable a pattern of activity correlated with the stimulus to become an attractor of the recurrent network. We show that in this limit the synaptic matrix becomes more correlated with the class prototypes than with any of the instances of the class. We also show that the number of classes that can be learned increases sharply when the coding level decreases, and determine the speeds of learning and forgetting of classes in the case of changes in the statistics of the input stream

    Intrinsic Noise in Game Dynamical Learning

    Get PDF
    Demographic noise has profound effects on evolutionary and population dynamics, as well as on chemical reaction systems and models of epidemiology. Such noise is intrinsic and due to the discreteness of the dynamics in finite populations. We here show that similar noise-sustained trajectories arise in game dynamical learning, where the stochasticity has a different origin: agents sample a finite number of moves of their opponents in between adaptation events. The limit of infinite batches results in deterministic modified replicator equations, whereas finite sampling leads to a stochastic dynamics. The characteristics of these fluctuations can be computed analytically using methods from statistical physics, and such noise can affect the attractors significantly, leading to noise-sustained cycling or removing periodic orbits of the standard replicator dynamics. © 2009 The American Physical Society

    Stochastic Adaptation in Finite Games Played by Heterogeneous Populations

    Get PDF
    In this paper, I analyze stochastic adaptation in finite n-player games played by heterogeneous populations of myopic best repliers, better repliers and imitators. In each period, one individual from each of n populations, one for each player role, is drawn to play and chooses a pure strategy according to her personal learning rule after observing a sample from a finite history. With a small probability individuals also make a mistake and play a pure strategy at random. I prove that, for a sufficiently low ratio between the sample and history size, only pure-strategy profiles in certain minimal closed sets under better replies will be played with positive probability in the limit, as the probability of mistakes tends to zero. If, in addition, the strategy profiles in one such set have strictly higher payoffs than all other strategy profiles and the sample size is sufficiently large, then the strategies in this set will be played with probability one in the limit. Applied to 2x2 Coordination Games, the Pareto dominant equilibrium is selected for a sufficiently large sample size, but in all symmetric and many asymmetric games, the risk dominant equilibrium is selected for a sufficiently small sample size.Bounded rationality; Evolutionary game theory; Imitation; Better replies; Markov chain; Stochastic stability; Pareto dominance; Risk dominance

    Noisy fitness evaluation in genetic algorithms and the dynamics of learning

    Get PDF
    A theoretical model is presented which describes selection in a genetic algorithm (GA) under a stochastic fitness measure and correctly accounts for finite population effects. Although this model describes a number of selection schemes, we only consider Boltzmann selection in detail here as results for this form of selection are particularly transparent when fitness is corrupted by additive Gaussian noise. Finite population effects are shown to be of fundamental importance in this case, as the noise has no effect in the infinite population limit. In the limit of weak selection we show how the effects of any Gaussian noise can be removed by increasing the population size appropriately. The theory is tested on two closely related problems: the one-max problem corrupted by Gaussian noise and generalization in a perceptron with binary weights. The averaged dynamics can be accurately modelled for both problems using a formalism which describes the dynamics of the GA using methods from statistical mechanics. The second problem is a simple example of a learning problem and by considering this problem we show how the accurate characterization of noise in the fitness evaluation may be relevant in machine learning. The training error (negative fitness) is the number of misclassified training examples in a batch and can be considered as a noisy version of the generalization error if an independent batch is used for each evaluation. The noise is due to the finite batch size and in the limit of large problem size and weak selection we show how the effect of this noise can be removed by increasing the population size. This allows the optimal batch size to be determined, which minimizes computation time as well as the total number of training examples required

    Fixation and escape times in stochastic game learning

    Full text link
    Evolutionary dynamics in finite populations is known to fixate eventually in the absence of mutation. We here show that a similar phenomenon can be found in stochastic game dynamical batch learning, and investigate fixation in learning processes in a simple 2x2 game, for two-player games with cyclic interaction, and in the context of the best-shot network game. The analogues of finite populations in evolution are here finite batches of observations between strategy updates. We study when and how such fixation can occur, and present results on the average time-to-fixation from numerical simulations. Simple cases are also amenable to analytical approaches and we provide estimates of the behaviour of so-called escape times as a function of the batch size. The differences and similarities with escape and fixation in evolutionary dynamics are discussed.Comment: 19 pages, 9 figure
    • …
    corecore