136,197 research outputs found

    Universal Learning of Repeated Matrix Games

    Full text link
    We study and compare the learning dynamics of two universal learning algorithms, one based on Bayesian learning and the other on prediction with expert advice. Both approaches have strong asymptotic performance guarantees. When confronted with the task of finding good long-term strategies in repeated 2x2 matrix games, they behave quite differently.Comment: 16 LaTeX pages, 8 eps figure

    Testing the TASP: An Experimental Investigation of Learning in Games with Unstable Equilibria

    Get PDF
    We report experiments designed to test between Nash equilibria that are stable and unstable under learning. The “TASP” (Time Average of the Shapley Polygon) gives a precise prediction about what happens when there is divergence from equilibrium under fictitious play like learning processes. We use two 4 x 4 games each with a unique mixed Nash equilibrium; one is stable and one is unstable under learning. Both games are versions of Rock-Paper-Scissors with the addition of a fourth strategy, Dumb. Nash equilibrium places a weight of 1/2 on Dumb in both games, but the TASP places no weight on Dumb when the equilibrium is unstable. We also vary the level of monetary payoffs with higher payoffs predicted to increase instability. We find that the high payoff unstable treatment differs from the others. Frequency of Dumb is lower and play is further from Nash than in the other treatments. That is, we find support for the comparative statics prediction of learning theory, although the frequency of Dumb is substantially greater than zero in the unstable treatments.games, experiments, TASP, learning, unstable, mixed equilibrium, fictitious play.

    Action-Conditional Video Prediction using Deep Networks in Atari Games

    Full text link
    Motivated by vision-based reinforcement learning (RL) problems, in particular Atari games from the recent benchmark Aracade Learning Environment (ALE), we consider spatio-temporal prediction problems where future (image-)frames are dependent on control variables or actions as well as previous frames. While not composed of natural scenes, frames in Atari games are high-dimensional in size, can involve tens of objects with one or more objects being controlled by the actions directly and many other objects being influenced indirectly, can involve entry and departure of objects, and can involve deep partial observability. We propose and evaluate two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks. Experimental results show that the proposed architectures are able to generate visually-realistic frames that are also useful for control over approximately 100-step action-conditional futures in some games. To the best of our knowledge, this paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs.Comment: Published at NIPS 2015 (Advances in Neural Information Processing Systems 28
    • …
    corecore