2,882 research outputs found
Using Cultural Coevolution for Learning in General Game Playing
Traditionally, the construction of game playing agents relies on using pre-programmed heuristics and architectures tailored for a specific game. General Game Playing (GGP) provides a challenging alternative to this approach, with the aim being to construct players that are able to play any game, given just the rules. This thesis describes the construction of a General Game Player that is able to learn and build knowledge about the game in a multi-agent setup using cultural coevolution and reinforcement learning. We also describe how this knowledge can be used to complement UCT search, a Monte-Carlo tree search that has already been used successfully in GGP. Experiments are conducted to test the effectiveness of the knowledge by playing several games between our player and a player using random moves, and also a player using standard UCT search. The results show a marked improvement in performance when using the knowledge
Agents Explore the Environment Beyond Good Actions to Improve Their Model for Better Decisions
Improving the decision-making capabilities of agents is a key challenge on
the road to artificial intelligence. To improve the planning skills needed to
make good decisions, MuZero's agent combines prediction by a network model and
planning by a tree search using the predictions. MuZero's learning process can
fail when predictions are poor but planning requires them. We use this as an
impetus to get the agent to explore parts of the decision tree in the
environment that it otherwise would not explore. The agent achieves this, first
by normal planning to come up with an improved policy. Second, it randomly
deviates from this policy at the beginning of each training episode. And third,
it switches back to the improved policy at a random time step to experience the
rewards from the environment associated with the improved policy, which is the
basis for learning the correct value expectation. The simple board game
Tic-Tac-Toe is used to illustrate how this approach can improve the agent's
decision-making ability. The source code, written entirely in Java, is
available at https://github.com/enpasos/muzero.Comment: Submitted to NeurIPS 202
Mobile application platform heterogeneity: Android vs Windows phone vs iOS vs Firefox OS
Modern smartphones have a rich spectrum of increasingly sophisticated features, opening opportunities for software-led innovation. Of the large number of platforms to develop new software on, in this paper we look closely at three platforms identified as market leaders for the smartphone market by Gartner Group in 2013 and one platform, Firefox OS, representing a new paradigm for operating systems based on web technologies. We compare the platforms in several different categories, such as software architecture, application development, platform capabilities and constraints, and, finally, developer support. Using the implementation of a mobile version of the tic-tac-toe game on all the four platforms, we seek to investigate strengths, weaknesses and challenges of mobile application development on these platforms. Big differences are highlighted when inspecting community environments, hardware abilities and platform maturity. These inevitably impact upon developer choices when deciding on mobile platform development strategies
Minimax Exploiter: A Data Efficient Approach for Competitive Self-Play
Recent advances in Competitive Self-Play (CSP) have achieved, or even
surpassed, human level performance in complex game environments such as Dota 2
and StarCraft II using Distributed Multi-Agent Reinforcement Learning (MARL).
One core component of these methods relies on creating a pool of learning
agents -- consisting of the Main Agent, past versions of this agent, and
Exploiter Agents -- where Exploiter Agents learn counter-strategies to the Main
Agents. A key drawback of these approaches is the large computational cost and
physical time that is required to train the system, making them impractical to
deploy in highly iterative real-life settings such as video game productions.
In this paper, we propose the Minimax Exploiter, a game theoretic approach to
exploiting Main Agents that leverages knowledge of its opponents, leading to
significant increases in data efficiency. We validate our approach in a
diversity of settings, including simple turn based games, the arcade learning
environment, and For Honor, a modern video game. The Minimax Exploiter
consistently outperforms strong baselines, demonstrating improved stability and
data efficiency, leading to a robust CSP-MARL method that is both flexible and
easy to deploy
Relational Representations in Reinforcement Learning: Review and Open Problems
This paper is about representation in RL.We discuss some of the concepts in representation and generalization in reinforcement learning and argue for higher-order representations, instead of the commonly used propositional representations. The paper contains a small review of current reinforcement learning systems using higher-order representations, followed by a brief discussion. The paper ends with research directions and open problems.\u
Aprendizaje profundo aplicado a juegos de tablero por turnos
Trabajo fin de Grado en Doble Grado en Ingeniería Informatica-Matemáticas, Facultad de Informática UCM, Departamento de Ingeniería del Software e Inteligencia Artificial, Curso 2020-2021Due to the astonishing growth rate in computational power, artificial intelligence is achieving milestones that were considered as inconceivable just a few decades ago. One of them is AlphaZero, an algorithm capable of reaching superhuman performance in chess, shogi and Go, with just a few hours of self-play and given no domain knowledge except the game rules.
In this paper, we review the fundamentals, explain how the algorithm works, and develop our own version of it, capable of being executed on a personal computer. Despite the lack of available computational resources, we have managed to master less complex games such as Tic-Tac-Toe and Connect 4. To verify learning, we test our implementation against other strategies and analyze the results obtained.Gracias al ritmo vertiginoso al que crece la capacidad computacional, la inteligencia artificial está ́logrando hitos que hace tan solo unas décadas se consideraban impensables. Uno de ellos es AlphaZero, un algoritmo capaz de alcanzar un nivel de juego sobrehumano en ajedrez, shogi y Go, mediante unas pocas horas de autoaprendizaje y sin conocimiento del dominio excepto las reglas del juego. En este trabajo, revisamos los fundamentos, explicamos cómo funciona el algoritmo y desarrollamos nuestra propia versión de este, capaz de ser ejecutada en un ordenador personal. A pesar de la escasez de recursos computacionales disponibles, hemos conseguido dominar juegos menos complejos como el Tres en Raya y el Conecta 4. Para verificar el aprendizaje, probamos nuestra implementación contra otras estrategias y analizamos los resultados obtenidos.Depto. de Ingeniería de Software e Inteligencia Artificial (ISIA)Fac. de InformáticaTRUEunpu
In search of no-loss strategies for the game of Tic-tac-toe using a customized genetic algorithm
The game of Tic-tac-toe is one of the most commonly known games. This game does not allow one to win all the time and a significant proportion of games played results in a draw. Thus, the best a player can hope is to not lose the game. This study is aimed at evolving a number of no-loss strategies using genetic algorithms and comparing them with existing methodologies. To efficiently evolve no-loss strategies, we have developed innovative ways of representing and evaluating a solution, initializing the GA population, developing GA operators including an elite preserving scheme. Interestingly, our GA implementation is able to find more than 72 thousands no-loss strategies for playing the game. Moreover, an analysis of these solutions has given us insights about how to play the game to not lose it. Based on this experience, we have developed specialized efficient strategies having a high win-to-draw ratio. The study and its results are interesting and can be encouraging for the techniques to be applied to other board games for finding efficient strategies
Automatic Generation of Alternative Starting Positions for Simple Traditional Board Games
Simple board games, like Tic-Tac-Toe and CONNECT-4, play an important role
not only in the development of mathematical and logical skills, but also in the
emotional and social development. In this paper, we address the problem of
generating targeted starting positions for such games. This can facilitate new
approaches for bringing novice players to mastery, and also leads to discovery
of interesting game variants. We present an approach that generates starting
states of varying hardness levels for player~ in a two-player board game,
given rules of the board game, the desired number of steps required for
player~ to win, and the expertise levels of the two players. Our approach
leverages symbolic methods and iterative simulation to efficiently search the
extremely large state space. We present experimental results that include
discovery of states of varying hardness levels for several simple grid-based
board games. The presence of such states for standard game variants like Tic-Tac-Toe opens up new games to be played that have never been
played as the default start state is heavily biased.Comment: A conference version of the paper will appear in AAAI 201
- …