135 research outputs found

    Coevolutionary Approaches to Generating Robust Build-Orders for Real-Time Strategy Games

    Get PDF
    We aim to find winning build-orders for Real-Time Strategy games. Real-Time Strategy games provide a variety of challenges, from short-term control to longer term planning. We focus on a longer-term planning problem; which units to build and in what order to produce the units so a player successfully defeats the opponent. Plans which address unit construction scheduling problems in Real-Time Strategy games are called build-orders. A robust build-order defeats many opponents, while a strong build-order defeats opponents quickly. However, no single build-order defeats all other build-orders, and build-orders that defeat many opponents may still lose against a specific opponent. Other researchers have only investigated generating build-orders that defeat a specific opponent, rather than finding robust, strong build-orders. Additionally, previous research has not applied coevolutionary algorithms towards generating build-orders. In contrast, our research has three main contributions towards finding robust, strong build-orders. First, we apply a coevolutionary algorithm towards finding robust build-orders. Compared to exhaustive search, a genetic algorithm finds the strongest build-orders while a coevolutionary algorithm finds more robust build-orders. Second, we show that case-injection enables coevolution to learn from specific opponents while maintaining robustness. Build-orders produced with coevolution and case-injection learn to defeat or play like the injected build-orders. Third, we show that coevolved build-orders benefit from a representation which includes branches and loops. Coevolution will utilize multiple branches and loops to create build-orders that are stronger than build-orders without loops and branches. We believe this work provides evidence that coevolutionary algorithms may be a viable approach to creating robust, strong build-orders for Real-Time Strategy games

    Learning Probabilistic Finite State Automata For Opponent Modelling

    Get PDF
    Artificial Intelligence (AI) is the branch of the Computer Science field that tries to imbue intelligent behaviour in software systems. In the early years of the field, those systems were limited to big computing units where researchers built expert systems that exhibited some kind of intelligence. But with the advent of different kinds of networks, which the more prominent of those is the Internet, the field became interested in Distributed Artificial Intelligence (DAI) as the normal move. The field thus moved from monolithic software architectures for its AI sys- tems to architectures where several pieces of software were trying to solve a problem or had interests on their own. Those pieces of software were called Agents and the architectures that allowed the interoperation of multiple agents were called Multi-Agent Systems (MAS). The agents act as a metaphor that tries to describe those software systems that are embodied in a given environ- ment and that behave or react intelligently to events in the environment. The AI mainstream was initially interested in systems that could be taught to behave depending on the inputs perceived. However this rapidly showed ineffective because the human or the expert acted as the knowledge bottleneck for distilling useful and efficient rules. This was in best cases, in worst cases the task of enumerating the rules was difficult or plainly not affordable. This sparked the interest of another subfield, Machine Learning and its counter part in a MAS, Distributed Machine Learning. If you can not code all the scenario combinations, code within the agent the rules that allows it to learn from the environment and the actions performed. With this framework in mind, applications are endless. Agents can be used to trade bonds or other financial derivatives without human intervention, or they can be embedded in a robotics hardware and learn unseen map config- uration in distant locations like distant planets. Agents are not restricted to interactions with humans or the environment, they can also interact with other agents themselves. For instance, agents can negotiate the quality of service of a channel before establishing a communication or they can share information about the environment in a cooperative setting like robot soccer players. But there are some shortcomings that emerge in a MAS architecture. The one related to this thesis is that partitioning the task at hand into agents usually entails that agents have less memory or computing power. It is not economically feasible to replicate the big computing unit on each separate agent in our system. Thus we can say that we should think about our agents as computationally bounded , that is, they have a limited amount of computing power to learn from the environment. This has serious implications on the algorithms that are commonly used for learning in these settings. The classical approach for learning in MAS system is to use some variation of a Reinforcement Learning (RL) algorithm [BT96, SB98]. The main idea around those algorithms is that the agent has to maintain a table with the per- ceived value of each action/state pair and through multiple iterations obtain a set of decision rules that allows to take the best action for a given environment. This approach has several flaws when the current action depends on a single observation seen in the past (for instance, a warning sign that a robot per- ceives). Several techniques has been proposed to alleviate those shortcomings. For instance to avoid the combinatorial explosion of states and actions, instead of storing a table with the value of the pairs an approximating function like a neural network can be used instead. And for events in the past, we can extend the state definition of the environment creating dummy states that correspond to the N-tuple (stateN, stateN−1, . . . , stateN−t

    Strategic interaction in the Prisoner's Dilemma: A game-theoretic dimension of conflict research

    Get PDF
    This four-part enquiry treats selected theoretical and empirical developments in the Prisoner's Dilemma. The enquiry is oriented within the sphere of game-theoretic conflict research, and addresses methodological and philosophical problems embedded in the model under consideration. In Part One, relevant taxonomic criteria of the von Neumann- Morgenstern theory of games are reviewed, and controversies associated with both the utility function and game-theoretic rationality are introduced. In Part Two, salient contributions by Rapoport and others to the Prisoner's Dilemma are enlisted to illustrate the model's conceptual richness and problematic wealth. Conflicting principles of choice, divergent concepts of rational choice, and attempted resolutions of the dilemma are evaluated in the static mode. In Part Three, empirical interaction among strategies is examined in the iterated mode. A computer-simulated tournament of competing families of strategies is conducted, as both a complement to and continuation of Axelrod's previous tournaments. Combinatoric sub-tournaments are exhaustively analyzed, and an eliminatory ecological scenario is generated. In Part Four, the performance of the maximization family of strategies is subjected to deeper analysis, which reveals critical strengths and weaknesses latent in its decision-making process. On the whole, an inter-modal continuity obtains, which suggests that the maximization of expected utility, weighted toward probabilistic co-operation, is a relatively effective strategic embodiment of Rapoport's ethic of collective rationality

    Emerging communication between competitive agents

    Full text link
    Nous utilisons l’apprentissage automatique pour répondre à une question fondamentale: comment les individus peuvent apprendre à communiquer pour partager de l'information et se coordonner même en présence de conflits? Cette th\`ese essaie de corriger l'idée qui prévaut à l'heure actuelle dans la communauté de l'apprentissage profond que les agents compétitifs ne peuvent pas apprendre à communiquer efficacement. Dans ce travail de recherche, nous étudions l’émergence de la communication dans les jeux coopératifs-compétitifs à travers un jeu expéditeur-receveur que nous construisons. Nous portons aussi une attention particulière à la qualité de notre évaluation. Nous observons que les agents peuvent en effet apprendre à communiquer, confirmant des résultats connus dans les domaines des sciences économiques. Nous trouvons également trois façons d'améliorer le protocole de communication appris. Premierement, l'efficacité de la communication est proportionnelle au niveau de coopération entre les agents, les agents apprennent à communiquer plus facilement quand le jeu est plus coopératif que compétitif. Ensuite, LOLA (Foerster et al, 2018) peut améliorer la stabilité de l'entraînement et l'efficacité de la communication, principalement dans les jeux compétitifs. Et enfin, que les protocoles de communication discrets sont plus adaptés à l'apprentissage d'un protocole de communication juste et coopératif que les protocoles de communication continus. Le chapitre 1 présente une introduction aux techniques d'apprentissage utilisées par les agents, l'apprentissage automatique et l'apprentissage par renforcement, ainsi qu'une description des méthodes d'apprentissage par renforcement propre aux systemes multi-agents. Nous présentons ensuite un historique de l'émergence du language dans d'autres domaines tels que la biologie, la théorie des jeux évolutionnaires, et les sciences économiques. Le chapitre 2 approndit le sujet de l'émergence de la communication entre agents compétitifs. Le chapitre 3 présente les conclusions de notre travail et expose les enjeux et défis de l'apprentissage de la communication dans un environment compétitif.We investigate the fundamental question of how agents in competition learn communication protocols in order to share information and coordinate with each other. This work aims to overturn current literature in machine learning which holds that unaligned, self-interested agents do not learn to communicate effectively. To study emergent communication for the spectrum of cooperative-competitive games, we introduce a carefully constructed sender-receiver game and put special care into evaluation. We find that communication can indeed emerge in partially-competitive scenarios, and we discover three things that are tied to improving it. First, that selfish communication is proportional to cooperation, and it naturally occurs for situations that are more cooperative than competitive. Second, that stability and performance are improved by using LOLA (Foerster et al, 2018), a higher order ``theory-of-mind'' learning algorith, especially in more competitive scenarios. And third, that discrete protocols lend themselves better to learning fair, cooperative communication than continuous ones. Chapter 1 provides an introduction to the underlying learning techniques of the agents, Machine Learning and Reinforcement Learning, and provides an overview of approaches to Multi-Agent Reinforcement Learning for different types of games. It then gives a background on language emergence by motivating this study and examining the history of techniques and results across Biology, Evolutionary Game Theory, and Economics. Chapter 2 delves into the work on language emergence between selfish, competitive agents. Chapter 3 draws conclusion from the work and points out the intrigue and challenge of learning communication in a competitive setting, setting the stage for future work

    Coevolutionary algorithms for the optimization of strategies for red teaming applications

    Get PDF
    Red teaming (RT) is a process that assists an organization in finding vulnerabilities in a system whereby the organization itself takes on the role of an “attacker” to test the system. It is used in various domains including military operations. Traditionally, it is a manual process with some obvious weaknesses: it is expensive, time-consuming, and limited from the perspective of humans “thinking inside the box”. Automated RT is an approach that has the potential to overcome these weaknesses. In this approach both the red team (enemy forces) and blue team (friendly forces) are modelled as intelligent agents in a multi-agent system and the idea is to run many computer simulations, pitting the plan of the red team against the plan of blue team. This research project investigated techniques that can support automated red teaming by conducting a systematic study involving a genetic algorithm (GA), a basic coevolutionary algorithm and three variants of the coevolutionary algorithm. An initial pilot study involving the GA showed some limitations, as GAs only support the optimization of a single population at a time against a fixed strategy. However, in red teaming it is not sufficient to consider just one, or even a few, opponent‟s strategies as, in reality, each team needs to adjust their strategy to account for different strategies that competing teams may utilize at different points. Coevolutionary algorithms (CEAs) were identified as suitable algorithms which were capable of optimizing two teams simultaneously for red teaming. The subsequent investigation of CEAs examined their performance in addressing the characteristics of red teaming problems, such as intransitivity relationships and multimodality, before employing them to optimize two red teaming scenarios. A number of measures were used to evaluate the performance of CEAs and in terms of multimodality, this study introduced a novel n-peak problem and a new performance measure based on the Circular Earth Movers‟ Distance. Results from the investigations involving an intransitive number problem, multimodal problem and two red teaming scenarios showed that in terms of the performance measures used, there is not a single algorithm that consistently outperforms the others across the four test problems. Applications of CEAs on the red teaming scenarios showed that all four variants produced interesting evolved strategies at the end of the optimization process, as well as providing evidence of the potential of CEAs in their future application in red teaming. The developed techniques can potentially be used for red teaming in military operations or analysis for protection of critical infrastructure. The benefits include the modelling of more realistic interactions between the teams, the ability to anticipate and to counteract potentially new types of attacks as well as providing a cost effective solution

    Rules of engagement : competitive coevolutionary dynamics in computational systems

    Get PDF
    Given that evolutionary biologists have considered coevolutionary interactions since the dawn of Darwinism, it is perhaps surprising that coevolution was largely overlooked during the formative years of evolutionary computing. It was not until the early 1990s that Hillis' seminal work thrust coevolution into the spotlight. Upon attempting to evolve fixed-length sorting networks, a problem with a long and competitive history, Hillis found that his standard evolutionary algorithm was producing sub-standard networks. In response, he decided to reciprocally evolve a population of testlists against the sorting network population; thus producing a coevolutionary system. The result was impressive; coevolution not only outperformed evolution, but the best network it discovered was only one comparison longer than the best-known solution. For the first time, a coevolutionary algorithm had been successfully applied to problem-solving. Pre-Hillis, the shortcomings of standard evolutionary algorithms had been understood for some time: whilst defining an adequate fitness function can be as challenging as the problem one is hoping to solve, once achieved, the accumulation of fitness-improving mutations can push a population towards local optima that are difficult to escape. Coevolution offers a solution. By allowing the fitness of each evolving individual to vary (through competition) with other reciprocally evolving individuals, coevolution removes the requirement of a fitness yardstick. In conjunction, the reciprocal adaptations of each individual begin to erode local optima as soon as they appear. However, coevolution is no panacea. As a problem-solving tool, coevolutionary algorithms suffer from some debilitating dynamics, each a result of the relative fitness assessment of individuals. In a single-, or multi-, population competitive system, coevolution may stabilize at a suboptimal equilibrium, or mediocre stable state; analogous to the traditional problem of local optima. Populations may become highly specialized in an unanticipated (and undesirable) manner; potentially resulting in brittle solutions that are fragile to perturbation. The system may cycle; producing dynamics similar to the children's game rock-paper-scissors. Disengagement may occur, whereby one population out-performs another to the extent that individuals cannot be discriminated on the basis of fitness alone; thus removing selection pressure and allowing populations to drift. Finally, coevolution's relative fitness assessment renders traditional visualization techniques (such as the graph of fitness over time) obsolete; thus exacerbating each of the above problems. This thesis attempts to better understand and address the problems of coevolution through the design and analysis of simple coevolutionary models. 'Reduced virulence' - a novel technique specifically designed to tackle disengagement - is developed. Empirical results demonstrate the ability of reduced virulence to combat disengagement both in simple and complex domains, whilst outperforming the only known competitors. Combining reduced virulence with diversity maintenance techniques is also shown to counteract mediocre stability and over-specialization. A critique of the CIAO plot - a visualization technique developed to detect coevolutionary cycling - highlights previously undocumented ambiguities; experimental evidence demonstrates the need for complementary visualizations. Extending the scope of visualization, a first exploration into coevolutionary steering is performed; a technique allowing the user to interact with a coevolutionary system during run-time. Using a simple model incorporating reduced virulence, the coevolutionary steering demonstration highlights the future potential of such tools for both research and education. The role of neutrality in coevolution is discussed in detail. Whilst much emphasis is placed upon neutral networks in the evolutionary computation literature, the nature of coevolutionary neutrality is generally overlooked. Preliminary ideas for modelling coevolutionary neutrality are presented. Finally, whilst this thesis is primarily aimed at a computing audience, strong reference to evolutionary biology is made throughout. Exemplifying potential crossover, the CIAO plot, a tool previously unused in biology, is applied to a simulation of E. Coli, with results con rming empirical observations of real bacteria.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Statistical modelling of games

    Get PDF
    This thesis mainly focuses on the statistical modelling of a selection of games, namely, the minority game, the urn model and the Hawk-Dove game. Chapters 1 and 2 give a brief introduction and survey of the field. In Chapter 3, the key characteristics of the minority game are reproduced. In addition, the minority game is extended to include wealth distribution and leverage effect. By assuming that each player has initial wealth which rises and falls according to profit and loss, with the potential of borrowing and bankruptcy, we find that modelled wealth distribution may be power law distributed and leverage increases the instability of the system. In Chapter 4, to explore the effects of memory, we construct a model where agents with memories of different lengths compete for finite resources. Using analytical and numerical approaches, our research demonstrates that an instability exists at a critical memory length; and players with different memory lengths are able to compete with each other and achieve a state of co-existence. The analytical solution is found to be connected to the well-known urn model. Additionally, our findings reveal that the temperature is related to the agent's memory. Due to its general nature, this memory model could potentially be relevant for a variety of other game models. In Chapter 5, our main finding is extended to the Hawk-Dove game, by introducing the memory parameter to each agent playing the game. An assumption is made that agents try to maximise their profits by learning from past experiences, stored in their finite memories. We show that the analytical results obtained from these two games are in agreement with the results from our simulations. It is concluded that the instability occurs when agents' memory lengths reach the critical value. Finally, Chapter 6 provides some concluding remarks and outlines some potential future work

    Learning Probabilistic Finite State Automata For Opponent Modelling

    Get PDF
    Artificial Intelligence (AI) is the branch of the Computer Science field that tries to imbue intelligent behaviour in software systems. In the early years of the field, those systems were limited to big computing units where researchers built expert systems that exhibited some kind of intelligence. But with the advent of different kinds of networks, which the more prominent of those is the Internet, the field became interested in Distributed Artificial Intelligence (DAI) as the normal move. The field thus moved from monolithic software architectures for its AI sys- tems to architectures where several pieces of software were trying to solve a problem or had interests on their own. Those pieces of software were called Agents and the architectures that allowed the interoperation of multiple agents were called Multi-Agent Systems (MAS). The agents act as a metaphor that tries to describe those software systems that are embodied in a given environ- ment and that behave or react intelligently to events in the environment. The AI mainstream was initially interested in systems that could be taught to behave depending on the inputs perceived. However this rapidly showed ineffective because the human or the expert acted as the knowledge bottleneck for distilling useful and efficient rules. This was in best cases, in worst cases the task of enumerating the rules was difficult or plainly not affordable. This sparked the interest of another subfield, Machine Learning and its counter part in a MAS, Distributed Machine Learning. If you can not code all the scenario combinations, code within the agent the rules that allows it to learn from the environment and the actions performed. With this framework in mind, applications are endless. Agents can be used to trade bonds or other financial derivatives without human intervention, or they can be embedded in a robotics hardware and learn unseen map config- uration in distant locations like distant planets. Agents are not restricted to interactions with humans or the environment, they can also interact with other agents themselves. For instance, agents can negotiate the quality of service of a channel before establishing a communication or they can share information about the environment in a cooperative setting like robot soccer players. But there are some shortcomings that emerge in a MAS architecture. The one related to this thesis is that partitioning the task at hand into agents usually entails that agents have less memory or computing power. It is not economically feasible to replicate the big computing unit on each separate agent in our system. Thus we can say that we should think about our agents as computationally bounded , that is, they have a limited amount of computing power to learn from the environment. This has serious implications on the algorithms that are commonly used for learning in these settings. The classical approach for learning in MAS system is to use some variation of a Reinforcement Learning (RL) algorithm [BT96, SB98]. The main idea around those algorithms is that the agent has to maintain a table with the per- ceived value of each action/state pair and through multiple iterations obtain a set of decision rules that allows to take the best action for a given environment. This approach has several flaws when the current action depends on a single observation seen in the past (for instance, a warning sign that a robot per- ceives). Several techniques has been proposed to alleviate those shortcomings. For instance to avoid the combinatorial explosion of states and actions, instead of storing a table with the value of the pairs an approximating function like a neural network can be used instead. And for events in the past, we can extend the state definition of the environment creating dummy states that correspond to the N-tuple (stateN, stateN−1, . . . , stateN−t
    corecore