26,131 research outputs found

    AI for Classic Video Games using Reinforcement Learning

    Get PDF
    Deep reinforcement learning is a technique to teach machines tasks based on trial and error experiences in the way humans learn. In this paper, some preliminary research is done to understand how reinforcement learning and deep learning techniques can be combined to train an agent to play Archon, a classic video game. We compare two methods to estimate a Q function, the function used to compute the best action to take at each point in the game. In the first approach, we used a Q table to store the states and weights of the corresponding actions. In our experiments, this method converged very slowly. Our second approach was similar to that of [1]: We used a convolutional neural network (CNN) to determine a Q function. This deep neural network model successfully learnt to control the Archon player using keyboard event that it generated. We observed that the second approaches Q function converged faster than the first. For the latter method, the neural net was trained only using prediodic screenshots taken while it was playing. Experiments were conducted on a machine that did not have a GPU, so our training was slower as compared to [1]

    Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

    Get PDF
    Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.Comment: updated version, incorporating conference feedbac

    Assessing the Potential of Classical Q-learning in General Game Playing

    Get PDF
    After the recent groundbreaking results of AlphaGo and AlphaZero, we have seen strong interests in deep reinforcement learning and artificial general intelligence (AGI) in game playing. However, deep learning is resource-intensive and the theory is not yet well developed. For small games, simple classical table-based Q-learning might still be the algorithm of choice. General Game Playing (GGP) provides a good testbed for reinforcement learning to research AGI. Q-learning is one of the canonical reinforcement learning methods, and has been used by (Banerjee &\& Stone, IJCAI 2007) in GGP. In this paper we implement Q-learning in GGP for three small-board games (Tic-Tac-Toe, Connect Four, Hex)\footnote{source code: https://github.com/wh1992v/ggp-rl}, to allow comparison to Banerjee et al.. We find that Q-learning converges to a high win rate in GGP. For the ϵ\epsilon-greedy strategy, we propose a first enhancement, the dynamic ϵ\epsilon algorithm. In addition, inspired by (Gelly &\& Silver, ICML 2007) we combine online search (Monte Carlo Search) to enhance offline learning, and propose QM-learning for GGP. Both enhancements improve the performance of classical Q-learning. In this work, GGP allows us to show, if augmented by appropriate enhancements, that classical table-based Q-learning can perform well in small games.Comment: arXiv admin note: substantial text overlap with arXiv:1802.0594

    Playing Atari with Deep Reinforcement Learning

    Full text link
    We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.Comment: NIPS Deep Learning Workshop 201
    corecore