2,882 research outputs found

    Using Cultural Coevolution for Learning in General Game Playing

    Get PDF
    Traditionally, the construction of game playing agents relies on using pre-programmed heuristics and architectures tailored for a specific game. General Game Playing (GGP) provides a challenging alternative to this approach, with the aim being to construct players that are able to play any game, given just the rules. This thesis describes the construction of a General Game Player that is able to learn and build knowledge about the game in a multi-agent setup using cultural coevolution and reinforcement learning. We also describe how this knowledge can be used to complement UCT search, a Monte-Carlo tree search that has already been used successfully in GGP. Experiments are conducted to test the effectiveness of the knowledge by playing several games between our player and a player using random moves, and also a player using standard UCT search. The results show a marked improvement in performance when using the knowledge

    Agents Explore the Environment Beyond Good Actions to Improve Their Model for Better Decisions

    Full text link
    Improving the decision-making capabilities of agents is a key challenge on the road to artificial intelligence. To improve the planning skills needed to make good decisions, MuZero's agent combines prediction by a network model and planning by a tree search using the predictions. MuZero's learning process can fail when predictions are poor but planning requires them. We use this as an impetus to get the agent to explore parts of the decision tree in the environment that it otherwise would not explore. The agent achieves this, first by normal planning to come up with an improved policy. Second, it randomly deviates from this policy at the beginning of each training episode. And third, it switches back to the improved policy at a random time step to experience the rewards from the environment associated with the improved policy, which is the basis for learning the correct value expectation. The simple board game Tic-Tac-Toe is used to illustrate how this approach can improve the agent's decision-making ability. The source code, written entirely in Java, is available at https://github.com/enpasos/muzero.Comment: Submitted to NeurIPS 202

    Mobile application platform heterogeneity: Android vs Windows phone vs iOS vs Firefox OS

    Get PDF
    Modern smartphones have a rich spectrum of increasingly sophisticated features, opening opportunities for software-led innovation. Of the large number of platforms to develop new software on, in this paper we look closely at three platforms identified as market leaders for the smartphone market by Gartner Group in 2013 and one platform, Firefox OS, representing a new paradigm for operating systems based on web technologies. We compare the platforms in several different categories, such as software architecture, application development, platform capabilities and constraints, and, finally, developer support. Using the implementation of a mobile version of the tic-tac-toe game on all the four platforms, we seek to investigate strengths, weaknesses and challenges of mobile application development on these platforms. Big differences are highlighted when inspecting community environments, hardware abilities and platform maturity. These inevitably impact upon developer choices when deciding on mobile platform development strategies

    Minimax Exploiter: A Data Efficient Approach for Competitive Self-Play

    Full text link
    Recent advances in Competitive Self-Play (CSP) have achieved, or even surpassed, human level performance in complex game environments such as Dota 2 and StarCraft II using Distributed Multi-Agent Reinforcement Learning (MARL). One core component of these methods relies on creating a pool of learning agents -- consisting of the Main Agent, past versions of this agent, and Exploiter Agents -- where Exploiter Agents learn counter-strategies to the Main Agents. A key drawback of these approaches is the large computational cost and physical time that is required to train the system, making them impractical to deploy in highly iterative real-life settings such as video game productions. In this paper, we propose the Minimax Exploiter, a game theoretic approach to exploiting Main Agents that leverages knowledge of its opponents, leading to significant increases in data efficiency. We validate our approach in a diversity of settings, including simple turn based games, the arcade learning environment, and For Honor, a modern video game. The Minimax Exploiter consistently outperforms strong baselines, demonstrating improved stability and data efficiency, leading to a robust CSP-MARL method that is both flexible and easy to deploy

    Relational Representations in Reinforcement Learning: Review and Open Problems

    Get PDF
    This paper is about representation in RL.We discuss some of the concepts in representation and generalization in reinforcement learning and argue for higher-order representations, instead of the commonly used propositional representations. The paper contains a small review of current reinforcement learning systems using higher-order representations, followed by a brief discussion. The paper ends with research directions and open problems.\u

    Aprendizaje profundo aplicado a juegos de tablero por turnos

    Get PDF
    Trabajo fin de Grado en Doble Grado en Ingeniería Informatica-Matemáticas, Facultad de Informática UCM, Departamento de Ingeniería del Software e Inteligencia Artificial, Curso 2020-2021Due to the astonishing growth rate in computational power, artificial intelligence is achieving milestones that were considered as inconceivable just a few decades ago. One of them is AlphaZero, an algorithm capable of reaching superhuman performance in chess, shogi and Go, with just a few hours of self-play and given no domain knowledge except the game rules. In this paper, we review the fundamentals, explain how the algorithm works, and develop our own version of it, capable of being executed on a personal computer. Despite the lack of available computational resources, we have managed to master less complex games such as Tic-Tac-Toe and Connect 4. To verify learning, we test our implementation against other strategies and analyze the results obtained.Gracias al ritmo vertiginoso al que crece la capacidad computacional, la inteligencia artificial está ́logrando hitos que hace tan solo unas décadas se consideraban impensables. Uno de ellos es AlphaZero, un algoritmo capaz de alcanzar un nivel de juego sobrehumano en ajedrez, shogi y Go, mediante unas pocas horas de autoaprendizaje y sin conocimiento del dominio excepto las reglas del juego. En este trabajo, revisamos los fundamentos, explicamos cómo funciona el algoritmo y desarrollamos nuestra propia versión de este, capaz de ser ejecutada en un ordenador personal. A pesar de la escasez de recursos computacionales disponibles, hemos conseguido dominar juegos menos complejos como el Tres en Raya y el Conecta 4. Para verificar el aprendizaje, probamos nuestra implementación contra otras estrategias y analizamos los resultados obtenidos.Depto. de Ingeniería de Software e Inteligencia Artificial (ISIA)Fac. de InformáticaTRUEunpu

    In search of no-loss strategies for the game of Tic-tac-toe using a customized genetic algorithm

    Get PDF
    The game of Tic-tac-toe is one of the most commonly known games. This game does not allow one to win all the time and a significant proportion of games played results in a draw. Thus, the best a player can hope is to not lose the game. This study is aimed at evolving a number of no-loss strategies using genetic algorithms and comparing them with existing methodologies. To efficiently evolve no-loss strategies, we have developed innovative ways of representing and evaluating a solution, initializing the GA population, developing GA operators including an elite preserving scheme. Interestingly, our GA implementation is able to find more than 72 thousands no-loss strategies for playing the game. Moreover, an analysis of these solutions has given us insights about how to play the game to not lose it. Based on this experience, we have developed specialized efficient strategies having a high win-to-draw ratio. The study and its results are interesting and can be encouraging for the techniques to be applied to other board games for finding efficient strategies

    Automatic Generation of Alternative Starting Positions for Simple Traditional Board Games

    Full text link
    Simple board games, like Tic-Tac-Toe and CONNECT-4, play an important role not only in the development of mathematical and logical skills, but also in the emotional and social development. In this paper, we address the problem of generating targeted starting positions for such games. This can facilitate new approaches for bringing novice players to mastery, and also leads to discovery of interesting game variants. We present an approach that generates starting states of varying hardness levels for player~11 in a two-player board game, given rules of the board game, the desired number of steps required for player~11 to win, and the expertise levels of the two players. Our approach leverages symbolic methods and iterative simulation to efficiently search the extremely large state space. We present experimental results that include discovery of states of varying hardness levels for several simple grid-based board games. The presence of such states for standard game variants like 4×44 \times 4 Tic-Tac-Toe opens up new games to be played that have never been played as the default start state is heavily biased.Comment: A conference version of the paper will appear in AAAI 201
    corecore