53 research outputs found

    A novel computer Scrabble engine based on probability that performs at championship leve

    Get PDF
    The thesis starts by giving an introduction to the game of Scrabble, then mentions state-of-the-art computer Scrabble programs and presents some characteristics of our developed Scrabble engine Heuri. Some brief notions of Game Theory are given, along with history of some games in Artificial Intelligence; the fundamental algorithms for game playing, as well as state-of-the-art engines and the algorithms used by them, are presented. Basic elements of Scrabble, such as the Scrabble rules and the letter distribution, are given. Some history and state-of-the-art of Computer Scrabble are commented. For instance, the generation methods of valid moves based on the data structure DAWG (Directed Acyclic Word Graph) and also the variant GADDAG are recalled. These methods are used by the state-of-the-art Scrabble engines Quackle and Maven. Then, the contributions of this thesis are presented. A Spanish lexicon for playing Scrabble has been built that is used by Heuri engines. From this construction, a detailed study and classification of Spanish irregular verbs has been provided. A novel Scrabble move generator based on anagrams has been designed and implemented, which has been shown to be faster than the GADDAG-based generator used in Quackle engine. This method is similar to the way Scrabble players look for a move, searching for anagrams and a spot to play on the board. Next, we address the evaluation of moves when playing Scrabble; the quality of your game depends on deciding what move should be played given a certain board and a rack with tiles. This decision was made initially by Heuri trying several heuristics which ended up with the construction of several engines. We give the explanation of the heuristics used in these engines, all of them based on probabilities. All these initial heuristic evaluation functions (up to six) do not use forward looking, they are static evaluators. They have shown, after testing, an increasing playing performance, which allow Heuri to beat (top-level) expert human players in Spanish, without the need of using sampling and simulation techniques. These heuristics mainly consider the possibility of achieving a bingo on the actual board, whereas Quackle used pre-calculated values (superleaves) regardless of the latter. Then, in order to improve the quality of play of Heuri even more, some additional engines are presented in which look ahead is employed. The HeuriSamp engine, which evaluates a 2-ply search, permits to obtain a defense value. The HeuriSim engine uses a 3-ply adversarial search tree; it contemplates the best first moves (according to Heuri sixth engine heuristic) from Player 1, then some replies to these moves (Player 2 moves) and then some replies to these replies (Player 1 moves). Finally, to improve these engines, opponent modeling is used; this technique makes predictions on some of the opponents' tiles based on the last play made by the opponent. We present results obtained by playing thousands of Heuri vs Heuri games, collecting important information: general statistics of Scrabble game, like a 16 point handicap of the second player, and word statistics in Spanish, like a list of the most frequently played bingos (words that use all 7 tiles of a player's rack). In addition, we present results of matches played by Heuri against top-level humans in Spanish and results obtained by massive playing of different Heuri engines against the Quackle engine in Spanish, French and English. All these match results demonstrate the championship level performance of the Heuri engines in the three languages, especially of the last developed engine that includes simulation and opponent modeling techniques. From here, conclusions of the thesis are drawn and work for the future is envisaged.La tesi comença introduint el joc del Scrabble, esmentant els programes d’ordinador de l’estat de l’art que juguen Scrabble, i presentant algunes característiques del motor de joc de Scrabble que s’ha desenvolupat anomenat Heuri. Es donen breus nocions de la Teoria de Jocs, junt amb la història d’alguns jocs en Intel·ligència Artificial; es presenten els algorismes fonamentals per jugar, així com els motors de joc de l’estat de l’art en diferents jocs i els algorismes que usen. Es comenta també la història i estat de l’art del Computer Scrabble. Es recorden els mètodes de generació de moviments vàlids basats en l’estructura de dades DAWG (Directed Acyclic Word Graph) i en la variant GADDAG, que són usats pels motors de joc de Scrabble Quackle i Maven. A continuació es presenten les contribucions de la tesi. S’ha construït un diccionari per jugar Scrabble en espanyol, el qual és usat per les diferentes versions del motor de joc Heuri. S’ha fet un estudi detallat i una classificació dels verbs irregulars en espanyol. S’ha dissenyat i implementat un nou generador de moviments de Scrabble basat en anagrames, que ha demostrat ser més ràpid que el generador basat en GADDAG usat al motor Quackle. Aquest mètode és similar a la manera en la que els jugadors de Scrabble cerquen un moviment, buscant anagrames i un lloc del tauler on col·locar-los. Seguidament, es tracta l’evacuació dels moviments quan es juga Scrabble; la qualitat del joc depèn de decidir quin moviment cal jugar donat un cert tauler i un faristol amb fitxes. En Heuri, inicialment, aquesta decisió es va prendre provant diferents heurístiques que van dur a la construcció de diversos motors. Donem l’explicació de les heurístiques usades en aquests motors, totes elles basades en probabilitats. Totes aquestes funcions d’avaluació heurística inicials (fins a sis) no miren cap endavant, fan avaluacions estàtiques. Han mostrat, després de ser provades, un rendiment creixent de nivell de joc, el que ha permès Heuri derrotar a jugadors humans experts de màxim nivell en espanyol, sense necessitat d’usar tècniques de mostreig i de simulació. Aquestes heurístiques consideren principalment la possibilitat d’aconseguir un bingo en el tauler actual, mentre que Quackle usa uns valors pre-calculats (superleaves) que no tenen en compte l’anterior. Amb l’objectiu de millorar la qualitat de joc de Heuri encara més, es presenten uns motors de joc addicionals que sí miren cap endavant. El motor HeuriSamp, que realitza una cerca 2-ply, permet obtenir un valor de defensa. El motor HeuriSim usa un arbre de cerca 3-ply; contempla els millors primers moviments (d’acord al sisè motor heurístic d’Heuri) del Jugador 1, després algunes respostes a aquests moviments (moviments del Jugador 2) i llavors algunes rèpliques a aquestes respostes (moviments del Jugador 1). Finalment, per a millorar aquests motors, es proposa usar modelatge d’oponents; aquesta tècnica realitza prediccions d’algunes de les fitxes de l’oponent basant-se en l’últim moviment jugat per aquest. Es presenten resultats obtinguts de jugar milers de partides d’Heuri contra Heuri, que recullen important informació: estadístiques generals del joc del Scrabble, com un handicap de 16 punts del segon jugador, i estadístiques de paraules en espanyol, com una llista dels bingos (paraules que usen les 7 fitxes del faristol d’un jugador) que es juguen més freqüentment. A més, es presenten resultats de partides jugades per Heuri contra jugadors humans de màxim nivell en espanyol i resultats obtinguts d'un gran nombre d’enfrontaments entre els diferents motors de joc d’Heuri contra el motor Quackle en espanyol, francès i anglès. Tots aquests resultats de partides jugades demostren el rendiment de nivell de campió dels motors d’Heuri en les tres llengües, especialment el de l’últim motor desenvolupat que inclou tècniques de de simulació i modelatge d'oponents. A partir d'aquí s'extreuen les conclusions de la tesi i es preveu treballar de cara al futur.Postprint (published version

    Many-agent Reinforcement Learning

    Get PDF
    Multi-agent reinforcement learning (RL) solves the problem of how each agent should behave optimally in a stochastic environment in which multiple agents are learning simultaneously. It is an interdisciplinary domain with a long history that lies in the joint area of psychology, control theory, game theory, reinforcement learning, and deep learning. Following the remarkable success of the AlphaGO series in single-agent RL, 2019 was a booming year that witnessed significant advances in multi-agent RL techniques; impressive breakthroughs have been made on developing AIs that outperform humans on many challenging tasks, especially multi-player video games. Nonetheless, one of the key challenges of multi-agent RL techniques is the scalability; it is still non-trivial to design efficient learning algorithms that can solve tasks including far more than two agents (N≫2N \gg 2), which I name by \emph{many-agent reinforcement learning} (MARL\footnote{I use the world of ``MARL" to denote multi-agent reinforcement learning with a particular focus on the cases of many agents; otherwise, it is denoted as ``Multi-Agent RL" by default.}) problems. In this thesis, I contribute to tackling MARL problems from four aspects. Firstly, I offer a self-contained overview of multi-agent RL techniques from a game-theoretical perspective. This overview fills the research gap that most of the existing work either fails to cover the recent advances since 2010 or does not pay adequate attention to game theory, which I believe is the cornerstone to solving many-agent learning problems. Secondly, I develop a tractable policy evaluation algorithm -- αα\alpha^\alpha-Rank -- in many-agent systems. The critical advantage of αα\alpha^\alpha-Rank is that it can compute the solution concept of α\alpha-Rank tractably in multi-player general-sum games with no need to store the entire pay-off matrix. This is in contrast to classic solution concepts such as Nash equilibrium which is known to be PPADPPAD-hard in even two-player cases. αα\alpha^\alpha-Rank allows us, for the first time, to practically conduct large-scale multi-agent evaluations. Thirdly, I introduce a scalable policy learning algorithm -- mean-field MARL -- in many-agent systems. The mean-field MARL method takes advantage of the mean-field approximation from physics, and it is the first provably convergent algorithm that tries to break the curse of dimensionality for MARL tasks. With the proposed algorithm, I report the first result of solving the Ising model and multi-agent battle games through a MARL approach. Fourthly, I investigate the many-agent learning problem in open-ended meta-games (i.e., the game of a game in the policy space). Specifically, I focus on modelling the behavioural diversity in meta-games, and developing algorithms that guarantee to enlarge diversity during training. The proposed metric based on determinantal point processes serves as the first mathematically rigorous definition for diversity. Importantly, the diversity-aware learning algorithms beat the existing state-of-the-art game solvers in terms of exploitability by a large margin. On top of the algorithmic developments, I also contribute two real-world applications of MARL techniques. Specifically, I demonstrate the great potential of applying MARL to study the emergent population dynamics in nature, and model diverse and realistic interactions in autonomous driving. Both applications embody the prospect that MARL techniques could achieve huge impacts in the real physical world, outside of purely video games

    Knowledge Migration Strategies for Optimization of Multi-Population Cultural Algorithm

    Get PDF
    Evolutionary Algorithms (EAs) are meta-heuristic algorithms used for optimization of complex problems. Cultural Algorithm (CA) is one of the EA which incorporates knowledge for optimization. CA with multiple population spaces each incorporating culture and genetic evolution to obtain better solutions are known as Multi-Population Cultural Algorithm (MPCA). MPCA allows to introduce a diversity of knowledge in a dynamic and heterogeneous environment. In an MPCA each population represents a solution space. An individual belonging to a given population could migrate from one population to another for the purpose of introducing new knowledge that influences other individuals in the population. In this thesis, we provide different migration strategies which are inspired from game theory model to improve the quality of solutions. Migration among the different population in MPCA can address the problem of knowledge sharing among population spaces. We have introduced five different migration strategies which are related to the field of economics. The principal idea behind incorporating these strategies is to improve the rate of convergence, increase diversity, better exploration of the search space, to avoid premature convergence and to escape from local optima. Strategies are particularly taken from the economics background as it allows the individual and the population to use their knowledge and make a decision whether to cooperate or to defect with other individuals and populations. We have tested the proposed algorithms against CEC 2015 expensive benchmark problems. These problems are a set of 15 functions which includes varied function categories. Results depict that it leads a to better solution when proposed algorithms used for problems with complex nature and higher dimensions. For 10 dimensional problems the proposed strategies have 7 out 15 better results and for 30 dimensional problems we have 12 out of 15 better results when compared to the existing algorithms
    • …
    corecore