336 research outputs found

    Ms Pac-Man versus Ghost Team CEC 2011 competition

    Get PDF
    Games provide an ideal test bed for computational intelligence and significant progress has been made in recent years, most notably in games such as Go, where the level of play is now competitive with expert human play on smaller boards. Recently, a significantly more complex class of games has received increasing attention: real-time video games. These games pose many new challenges, including strict time constraints, simultaneous moves and open-endedness. Unlike in traditional board games, computational play is generally unable to compete with human players. One driving force in improving the overall performance of artificial intelligence players are game competitions where practitioners may evaluate and compare their methods against those submitted by others and possibly human players as well. In this paper we introduce a new competition based on the popular arcade video game Ms Pac-Man: Ms Pac-Man versus Ghost Team. The competition, to be held at the Congress on Evolutionary Computation 2011 for the first time, allows participants to develop controllers for either the Ms Pac-Man agent or for the Ghost Team and unlike previous Ms Pac-Man competitions that relied on screen capture, the players now interface directly with the game engine. In this paper we introduce the competition, including a review of previous work as well as a discussion of several aspects regarding the setting up of the game competition itself. © 2011 IEEE

    Multiagent Learning Through Indirect Encoding

    Get PDF
    Designing a system of multiple, heterogeneous agents that cooperate to achieve a common goal is a difficult task, but it is also a common real-world problem. Multiagent learning addresses this problem by training the team to cooperate through a learning algorithm. However, most traditional approaches treat multiagent learning as a combination of multiple single-agent learning problems. This perspective leads to many inefficiencies in learning such as the problem of reinvention, whereby fundamental skills and policies that all agents should possess must be rediscovered independently for each team member. For example, in soccer, all the players know how to pass and kick the ball, but a traditional algorithm has no way to share such vital information because it has no way to relate the policies of agents to each other. In this dissertation a new approach to multiagent learning that seeks to address these issues is presented. This approach, called multiagent HyperNEAT, represents teams as a pattern of policies rather than individual agents. The main idea is that an agent’s location within a canonical team layout (such as a soccer team at the start of a game) tends to dictate its role within that team, called the policy geometry. For example, as soccer positions move from goal to center they become more offensive and less defensive, a concept that is compactly represented as a pattern. iii The first major contribution of this dissertation is a new method for evolving neural network controllers called HyperNEAT, which forms the foundation of the second contribution and primary focus of this work, multiagent HyperNEAT. Multiagent learning in this dissertation is investigated in predator-prey, room-clearing, and patrol domains, providing a real-world context for the approach. Interestingly, because the teams in multiagent HyperNEAT are represented as patterns they can scale up to an infinite number of multiagent policies that can be sampled from the policy geometry as needed. Thus the third contribution is a method for teams trained with multiagent HyperNEAT to dynamically scale their size without further learning. Fourth, the capabilities to both learn and scale in multiagent HyperNEAT are compared to the traditional multiagent SARSA(λ) approach in a comprehensive study. The fifth contribution is a method for efficiently learning and encoding multiple policies for each agent on a team to facilitate learning in multi-task domains. Finally, because there is significant interest in practical applications of multiagent learning, multiagent HyperNEAT is tested in a real-world military patrolling application with actual Khepera III robots. The ultimate goal is to provide a new perspective on multiagent learning and to demonstrate the practical benefits of training heterogeneous, scalable multiagent teams through generative encoding

    The Influence of Collective Working Memory Strategies on Agent Teams

    Get PDF
    Past self-organizing models of collectively moving "particles" (simulated bird flocks, fish schools, etc.) typically have been based on purely reflexive agents that have no significant memory of past movements or environmental obstacles. These agent collectives usually operate in abstract environments, but as these domains take on a greater realism, the collective requires behaviors use not only presently observed stimuli but also remembered information. It is hypothesized that the addition of a limited working memory of the environment, distributed among the collective's individuals can improve efficiency in performing tasks. This is first approached in a more traditional particle system in an abstract environment. Then it is explored for a single agent, and finally a team of agents, operating in a simulated 3-dimensional environment of greater realism. In the abstract environment, a limited distributed working memory produced a significant improvement in travel between locations, in some cases improving performance over time, while in others surprisingly achieving an immediate benefit from the influence of memory. When strategies for accumulating and manipulating memory were subsequently explored for a more realistic single agent in the 3-dimensional environment, if the agent kept a local or a cumulative working memory, its performance improved on different tasks, both when navigating nearby obstacles and, in the case of cumulative memory, when covering previously traversed terrain. When investigating a team of these agents engaged in a pursuit scenario, it was determined that a communicating and coordinating team still benefited from a working memory of the environment distributed among the agents, even with limited memory capacity. This demonstrates that a limited distributed working memory in a multi-agent system improves performance on tasks in domains of increasing complexity. This is true even though individual agents know only a fraction of the collective's entire memory, using this partial memory and interactions with others in the team to perform tasks. These results may prove useful in improving existing methodologies for control of collective movements for robotic teams, computer graphics, particle swarm optimization, and computer games, and in interpreting future experimental research on group movements in biological populations

    Evolutionary Design of Game Vehicles and Their Controllers

    Get PDF
    Procedural content generation (PCG) is a growing field of interest in the domain of computational intelligence as it relates to games. There are ever increasing examples and applications of PCG that have been studied in academic contexts. Player expectations of the amount of content in games increase as computers and video game consoles are capable of using more content, and automation of content creation becomes more desirable. While many means of procedural content generation using some form of search algorithm have been tried and tested, we examine evolutionary algorithms as a means to generate content, where it has not frequently been used before. We examine the generation of vehicles, specifically spaceships, within two dimensional game simulations. These simulations are based upon a simple Newtonian physics system with different physical rules, representing games such as Lunar Lander or Asteroids, and evolve linear vectors of real numbers that act as vehicle genotypes by encoding placement of components to a vehicle point mass, with a form defined by the placement of each component. We use simple 1-ply lookahead controllers, simple rule-based controllers, and MCTS-based controllers as means to test and therefore indirectly guide the evolution of vehicle designs. We are able to demonstrate that evolutionary algorithms can be used to generate effective vehicle designs, suitable for use by the same controller as used for testing, for simple tasks without much issue. We also show that there are some factors of a problem environment that impact the demands and the conditions affecting vehicle design evolution more than others, such as velocity loss factors and the topology of the game world used. It is also evident that the use of different controllers to test vehicles causes different designs to emerge based on the strengths of said controllers

    Many-agent Reinforcement Learning

    Get PDF
    Multi-agent reinforcement learning (RL) solves the problem of how each agent should behave optimally in a stochastic environment in which multiple agents are learning simultaneously. It is an interdisciplinary domain with a long history that lies in the joint area of psychology, control theory, game theory, reinforcement learning, and deep learning. Following the remarkable success of the AlphaGO series in single-agent RL, 2019 was a booming year that witnessed significant advances in multi-agent RL techniques; impressive breakthroughs have been made on developing AIs that outperform humans on many challenging tasks, especially multi-player video games. Nonetheless, one of the key challenges of multi-agent RL techniques is the scalability; it is still non-trivial to design efficient learning algorithms that can solve tasks including far more than two agents (N2N \gg 2), which I name by \emph{many-agent reinforcement learning} (MARL\footnote{I use the world of ``MARL" to denote multi-agent reinforcement learning with a particular focus on the cases of many agents; otherwise, it is denoted as ``Multi-Agent RL" by default.}) problems. In this thesis, I contribute to tackling MARL problems from four aspects. Firstly, I offer a self-contained overview of multi-agent RL techniques from a game-theoretical perspective. This overview fills the research gap that most of the existing work either fails to cover the recent advances since 2010 or does not pay adequate attention to game theory, which I believe is the cornerstone to solving many-agent learning problems. Secondly, I develop a tractable policy evaluation algorithm -- αα\alpha^\alpha-Rank -- in many-agent systems. The critical advantage of αα\alpha^\alpha-Rank is that it can compute the solution concept of α\alpha-Rank tractably in multi-player general-sum games with no need to store the entire pay-off matrix. This is in contrast to classic solution concepts such as Nash equilibrium which is known to be PPADPPAD-hard in even two-player cases. αα\alpha^\alpha-Rank allows us, for the first time, to practically conduct large-scale multi-agent evaluations. Thirdly, I introduce a scalable policy learning algorithm -- mean-field MARL -- in many-agent systems. The mean-field MARL method takes advantage of the mean-field approximation from physics, and it is the first provably convergent algorithm that tries to break the curse of dimensionality for MARL tasks. With the proposed algorithm, I report the first result of solving the Ising model and multi-agent battle games through a MARL approach. Fourthly, I investigate the many-agent learning problem in open-ended meta-games (i.e., the game of a game in the policy space). Specifically, I focus on modelling the behavioural diversity in meta-games, and developing algorithms that guarantee to enlarge diversity during training. The proposed metric based on determinantal point processes serves as the first mathematically rigorous definition for diversity. Importantly, the diversity-aware learning algorithms beat the existing state-of-the-art game solvers in terms of exploitability by a large margin. On top of the algorithmic developments, I also contribute two real-world applications of MARL techniques. Specifically, I demonstrate the great potential of applying MARL to study the emergent population dynamics in nature, and model diverse and realistic interactions in autonomous driving. Both applications embody the prospect that MARL techniques could achieve huge impacts in the real physical world, outside of purely video games

    Co-evolutionary and Reinforcement Learning Techniques Applied to Computer Go players

    Get PDF
    The objective of this thesis is model some processes from the nature as evolution and co-evolution, and proposing some techniques that can ensure that these learning process really happens and useful to solve some complex problems as Go game. The Go game is ancient and very complex game with simple rules which still is a challenge for the Artificial Intelligence. This dissertation cover some approaches that were applied to solve this problem, proposing solve this problem using competitive and cooperative co-evolutionary learning methods and other techniques proposed by the author. To study, implement and prove these methods were used some neural networks structures, a framework free available and coded many programs. The techniques proposed were coded by the author, performed many experiments to find the best configuration to ensure that co-evolution is progressing and discussed the results. Using co-evolutionary learning processes can be observed some pathologies which could impact co-evolution progress. In this dissertation is introduced some techniques to solve pathologies as loss of gradients, cycling dynamics and forgetting. According to some authors, one solution to solve these co-evolution pathologies is introduce more diversity in populations that are evolving. In this thesis is proposed some techniques to introduce more diversity and some diversity measurements for neural networks structures to monitor diversity during co-evolution. The genotype diversity evolved were analyzed in terms of its impact to global fitness of the strategies evolved and their generalization. Additionally, it was introduced a memory mechanism in the network neural structures to reinforce some strategies in the genes of the neurons evolved with the intention that some good strategies learned are not forgotten. In this dissertation is presented some works from other authors in which cooperative and competitive co-evolution has been applied. The Go board size used in this thesis was 9x9, but can be easily escalated to more bigger boards.The author believe that programs coded and techniques introduced in this dissertation can be used for other domains

    Reconstruction, Analysis and Synthesis of Collective Motion

    Get PDF
    As collective motion plays a crucial role in modern day robotics and engineering, it seems appealing to seek inspiration from nature, which abounds with examples of collective motion (starling flocks, fish schools etc.). This approach towards understanding and reverse-engineering a particular aspect of nature forms the foundation of this dissertation, and its main contribution is threefold. First we identify the importance of appropriate algorithms to extract parameters of motion from sampled observations of the trajectory, and then by assuming an appropriate generative model we turn this into a regularized inversion problem with the regularization term imposing smoothness of the reconstructed trajectory. First we assume a linear triple-integrator model, and by penalizing high values of the jerk path integral we reconstruct the trajectory through an analytical approach. Alternatively, the evolution of a trajectory can be governed by natural Frenet frame equations. Inadequacy of integrability theory for nonlinear systems poses the utmost challenge in having an analytic solution, and forces us to adopt a numerical optimization approach. However, by noting the fact that the underlying dynamics defines a left invariant vector field on a Lie group, we develop a framework based on Pontryagin's maximum principle. This approach toward data smoothing yields a semi-analytic solution. Equipped with appropriate algorithms for trajectory reconstruction we analyze flight data for biological motions, and this marks the second contribution of this dissertation. By analyzing the flight data of big brown bats in two different settings (chasing a free-flying praying mantis and competing with a conspecific to catch a tethered mealworm), we provide evidence to show the presence of a context specific switch in flight strategy. Moreover, our approach provides a way to estimate the behavioral latency associated with these foraging behaviors. On the other hand, we have also analyzed the flight data of European starling flocks, and it can be concluded from our analysis that the flock-averaged coherence (the average cosine of the angle between the velocities of a focal bird and its neighborhood center of mass, averaged over the entire flock) gets maximized by considering 5-7 nearest neighbors. The analysis also sheds some light into the underlying feedback mechanism for steering control. The third and final contribution of this dissertation lies in the domain of control law synthesis. Drawing inspiration from coherent movement of starling flocks, we introduce a strategy (Topological Velocity Alignment) for collective motion, wherein each agent aligns its velocity along the direction of motion of its neighborhood center of mass. A feedback law has also been proposed for achieving this strategy, and we have analyzed two special cases (two-body system; and an N-body system with cyclic interaction) to show effectiveness of our proposed feedback law. It has been observed through numerical simulation and robotic implementation that this approach towards collective motion can give rise to a splitting behavior

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp
    corecore