1,505 research outputs found

    CGAMES'2009

    Get PDF

    Learning to Search in Reinforcement Learning

    Get PDF
    In this thesis, we investigate the use of search based algorithms with deep neural networks to tackle a wide range of problems ranging from board games to video games and beyond. Drawing inspiration from AlphaGo, the first computer program to achieve superhuman performance in the game of Go, we developed a new algorithm AlphaZero. AlphaZero is a general reinforcement learning algorithm that combines deep neural networks with a Monte Carlo Tree search for planning and learning. Starting completely from scratch, without any prior human knowledge beyond the basic rules of the game, AlphaZero managed to achieve superhuman performance in Go, chess and shogi. Subsequently, building upon the success of AlphaZero, we investigated ways to extend our methods to problems in which the rules are not known or cannot be hand-coded. This line of work led to the development of MuZero, a model-based reinforcement learning agent that builds a deterministic internal model of the world and uses it to construct plans in its imagination. We applied our method to Go, chess, shogi and the classic Atari suite of video-games, achieving superhuman performance. MuZero is the first RL algorithm to master a variety of both canonical challenges for high performance planning and visually complex problems using the same principles. Finally, we describe Stochastic MuZero, a general agent that extends the applicability of MuZero to highly stochastic environments. We show that our method achieves superhuman performance in stochastic domains such as backgammon and the classic game of 2048 while matching the performance of MuZero in deterministic ones like Go

    Learning search decisions

    Get PDF

    Rapid adaptation of video game AI

    Get PDF

    A Principled Method for Exploiting Opening Books

    Get PDF
    International audienceWe used in the past a lot of computational power and human expertise for having a very big dataset of good 9x9 Go games, in order to build an opening book. We improved a lot the algorithm used for gen- erating these games. Unfortunately, the results were not very robust, as (i) opening books are definitely not transitive, making the non-regression testing extremely difficult and (ii) different time settings lead to opposite conclusions, because a good opening for a game with 10s per move on a single core is very different from a good opening for a game with 30s per move on a 32-cores machine (iii) some very bad moves sometimes occur. In this paper, we formalize the optimization of an opening book as a matrix game, compute the Nash equilibrium, and conclude that a naturally randomized opening book provides optimal performance (in the sense of Nash equilibria); surprisingly, from a finite set of opening books, we can choose a distribution on these opening books so that this random solution has a significantly better performance than each of the deterministic opening book

    The Exchange: A Novel

    Get PDF
    The Exchange is a fiction novel Xavier Savvy Kowalski, one of the most promising American chess prodigies and rumored up-and-comer for the international fame as a potential challenger for the world chess crown. After he loses the junior world chess championship in Venice, Italy, he retires to Las Vegas, Nevada, where he hopes to start his life over. Savvy\u27s father and the chess world at large conspire against him and he finds himself returning to competitive chess again after three years away. He assembles a new team to train him for a return to the world championship, and he also falls in love with a young prodigy he met during his retirement. Together they travel the United States and Europe as Savvy attempts to win back his reputation as America\u27s premier chess player while encountering various rivals, including his own father. The story culminates with Savvy\u27s final championship game, and with his dad

    A novel computer Scrabble engine based on probability that performs at championship leve

    Get PDF
    The thesis starts by giving an introduction to the game of Scrabble, then mentions state-of-the-art computer Scrabble programs and presents some characteristics of our developed Scrabble engine Heuri. Some brief notions of Game Theory are given, along with history of some games in Artificial Intelligence; the fundamental algorithms for game playing, as well as state-of-the-art engines and the algorithms used by them, are presented. Basic elements of Scrabble, such as the Scrabble rules and the letter distribution, are given. Some history and state-of-the-art of Computer Scrabble are commented. For instance, the generation methods of valid moves based on the data structure DAWG (Directed Acyclic Word Graph) and also the variant GADDAG are recalled. These methods are used by the state-of-the-art Scrabble engines Quackle and Maven. Then, the contributions of this thesis are presented. A Spanish lexicon for playing Scrabble has been built that is used by Heuri engines. From this construction, a detailed study and classification of Spanish irregular verbs has been provided. A novel Scrabble move generator based on anagrams has been designed and implemented, which has been shown to be faster than the GADDAG-based generator used in Quackle engine. This method is similar to the way Scrabble players look for a move, searching for anagrams and a spot to play on the board. Next, we address the evaluation of moves when playing Scrabble; the quality of your game depends on deciding what move should be played given a certain board and a rack with tiles. This decision was made initially by Heuri trying several heuristics which ended up with the construction of several engines. We give the explanation of the heuristics used in these engines, all of them based on probabilities. All these initial heuristic evaluation functions (up to six) do not use forward looking, they are static evaluators. They have shown, after testing, an increasing playing performance, which allow Heuri to beat (top-level) expert human players in Spanish, without the need of using sampling and simulation techniques. These heuristics mainly consider the possibility of achieving a bingo on the actual board, whereas Quackle used pre-calculated values (superleaves) regardless of the latter. Then, in order to improve the quality of play of Heuri even more, some additional engines are presented in which look ahead is employed. The HeuriSamp engine, which evaluates a 2-ply search, permits to obtain a defense value. The HeuriSim engine uses a 3-ply adversarial search tree; it contemplates the best first moves (according to Heuri sixth engine heuristic) from Player 1, then some replies to these moves (Player 2 moves) and then some replies to these replies (Player 1 moves). Finally, to improve these engines, opponent modeling is used; this technique makes predictions on some of the opponents' tiles based on the last play made by the opponent. We present results obtained by playing thousands of Heuri vs Heuri games, collecting important information: general statistics of Scrabble game, like a 16 point handicap of the second player, and word statistics in Spanish, like a list of the most frequently played bingos (words that use all 7 tiles of a player's rack). In addition, we present results of matches played by Heuri against top-level humans in Spanish and results obtained by massive playing of different Heuri engines against the Quackle engine in Spanish, French and English. All these match results demonstrate the championship level performance of the Heuri engines in the three languages, especially of the last developed engine that includes simulation and opponent modeling techniques. From here, conclusions of the thesis are drawn and work for the future is envisaged.La tesi comença introduint el joc del Scrabble, esmentant els programes d’ordinador de l’estat de l’art que juguen Scrabble, i presentant algunes característiques del motor de joc de Scrabble que s’ha desenvolupat anomenat Heuri. Es donen breus nocions de la Teoria de Jocs, junt amb la història d’alguns jocs en Intel·ligència Artificial; es presenten els algorismes fonamentals per jugar, així com els motors de joc de l’estat de l’art en diferents jocs i els algorismes que usen. Es comenta també la història i estat de l’art del Computer Scrabble. Es recorden els mètodes de generació de moviments vàlids basats en l’estructura de dades DAWG (Directed Acyclic Word Graph) i en la variant GADDAG, que són usats pels motors de joc de Scrabble Quackle i Maven. A continuació es presenten les contribucions de la tesi. S’ha construït un diccionari per jugar Scrabble en espanyol, el qual és usat per les diferentes versions del motor de joc Heuri. S’ha fet un estudi detallat i una classificació dels verbs irregulars en espanyol. S’ha dissenyat i implementat un nou generador de moviments de Scrabble basat en anagrames, que ha demostrat ser més ràpid que el generador basat en GADDAG usat al motor Quackle. Aquest mètode és similar a la manera en la que els jugadors de Scrabble cerquen un moviment, buscant anagrames i un lloc del tauler on col·locar-los. Seguidament, es tracta l’evacuació dels moviments quan es juga Scrabble; la qualitat del joc depèn de decidir quin moviment cal jugar donat un cert tauler i un faristol amb fitxes. En Heuri, inicialment, aquesta decisió es va prendre provant diferents heurístiques que van dur a la construcció de diversos motors. Donem l’explicació de les heurístiques usades en aquests motors, totes elles basades en probabilitats. Totes aquestes funcions d’avaluació heurística inicials (fins a sis) no miren cap endavant, fan avaluacions estàtiques. Han mostrat, després de ser provades, un rendiment creixent de nivell de joc, el que ha permès Heuri derrotar a jugadors humans experts de màxim nivell en espanyol, sense necessitat d’usar tècniques de mostreig i de simulació. Aquestes heurístiques consideren principalment la possibilitat d’aconseguir un bingo en el tauler actual, mentre que Quackle usa uns valors pre-calculats (superleaves) que no tenen en compte l’anterior. Amb l’objectiu de millorar la qualitat de joc de Heuri encara més, es presenten uns motors de joc addicionals que sí miren cap endavant. El motor HeuriSamp, que realitza una cerca 2-ply, permet obtenir un valor de defensa. El motor HeuriSim usa un arbre de cerca 3-ply; contempla els millors primers moviments (d’acord al sisè motor heurístic d’Heuri) del Jugador 1, després algunes respostes a aquests moviments (moviments del Jugador 2) i llavors algunes rèpliques a aquestes respostes (moviments del Jugador 1). Finalment, per a millorar aquests motors, es proposa usar modelatge d’oponents; aquesta tècnica realitza prediccions d’algunes de les fitxes de l’oponent basant-se en l’últim moviment jugat per aquest. Es presenten resultats obtinguts de jugar milers de partides d’Heuri contra Heuri, que recullen important informació: estadístiques generals del joc del Scrabble, com un handicap de 16 punts del segon jugador, i estadístiques de paraules en espanyol, com una llista dels bingos (paraules que usen les 7 fitxes del faristol d’un jugador) que es juguen més freqüentment. A més, es presenten resultats de partides jugades per Heuri contra jugadors humans de màxim nivell en espanyol i resultats obtinguts d'un gran nombre d’enfrontaments entre els diferents motors de joc d’Heuri contra el motor Quackle en espanyol, francès i anglès. Tots aquests resultats de partides jugades demostren el rendiment de nivell de campió dels motors d’Heuri en les tres llengües, especialment el de l’últim motor desenvolupat que inclou tècniques de de simulació i modelatge d'oponents. A partir d'aquí s'extreuen les conclusions de la tesi i es preveu treballar de cara al futur.Postprint (published version

    Agent-Based Models and Human Subject Experiments

    Get PDF
    This paper considers the relationship between agent-based modeling and economic decision-making experiments with human subjects. Both approaches exploit controlled ``laboratory'' conditions as a means of isolating the sources of aggregate phenomena. Research findings from laboratory studies of human subject behavior have inspired studies using artificial agents in ``computational laboratories'' and vice versa. In certain cases, both methods have been used to examine the same phenomenon. The focus of this paper is on the empirical validity of agent-based modeling approaches in terms of explaining data from human subject experiments. We also point out synergies between the two methodologies that have been exploited as well as promising new possibilities.agent-based models, human subject experiments, zero- intelligence agents, learning, evolutionary algorithms
    • …
    corecore