Search CORE

218 research outputs found

Emergence and Stability of Self-Evolved Cooperative Strategies using Stochastic Machines

Author: Kuan Jin Hong
Salecha Aadesh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/10/2020
Field of study

To investigate the origin of cooperative behaviors, we developed an evolutionary model of sequential strategies and tested our model with computer simulations. The sequential strategies represented by stochastic machines were evaluated through games of Iterated Prisoner's Dilemma (IPD) with other agents in the population, allowing co-evolution to occur. We expanded upon past works by proposing a novel mechanism to mutate stochastic Moore machines that enables a richer class of machines to be evolved. These machines were then subjected to various selection mechanisms and the resulting evolved strategies were analyzed. We found that cooperation can indeed emerge spontaneously in evolving populations playing iterated PD, specifically in the form of trigger strategies. In addition, we found that the resulting populations converged to evolutionarily stable states and were resilient towards mutation. In order to test the generalizability of our proposed mutation mechanism and simulation approach, we also evolved the machines to play other games such as Chicken, Stag Hunt, and Battle, and obtained strategies that perform as well as mixed strategies in Nash Equilibrium.Comment: 8 pages, 5 figures, Submitted to and Accepted for IEEE SSCI 2020 (Symposium Series on Computational Intelligence

arXiv.org e-Print Archive

Learning Probabilistic Finite State Automata For Opponent Modelling

Author: Cebrián Chuliá Toni
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2011
Field of study

Artificial Intelligence (AI) is the branch of the Computer Science field that tries to imbue intelligent behaviour in software systems. In the early years of the field, those systems were limited to big computing units where researchers built expert systems that exhibited some kind of intelligence. But with the advent of different kinds of networks, which the more prominent of those is the Internet, the field became interested in Distributed Artificial Intelligence (DAI) as the normal move. The field thus moved from monolithic software architectures for its AI sys- tems to architectures where several pieces of software were trying to solve a problem or had interests on their own. Those pieces of software were called Agents and the architectures that allowed the interoperation of multiple agents were called Multi-Agent Systems (MAS). The agents act as a metaphor that tries to describe those software systems that are embodied in a given environ- ment and that behave or react intelligently to events in the environment. The AI mainstream was initially interested in systems that could be taught to behave depending on the inputs perceived. However this rapidly showed ineffective because the human or the expert acted as the knowledge bottleneck for distilling useful and efficient rules. This was in best cases, in worst cases the task of enumerating the rules was difficult or plainly not affordable. This sparked the interest of another subfield, Machine Learning and its counter part in a MAS, Distributed Machine Learning. If you can not code all the scenario combinations, code within the agent the rules that allows it to learn from the environment and the actions performed. With this framework in mind, applications are endless. Agents can be used to trade bonds or other financial derivatives without human intervention, or they can be embedded in a robotics hardware and learn unseen map config- uration in distant locations like distant planets. Agents are not restricted to interactions with humans or the environment, they can also interact with other agents themselves. For instance, agents can negotiate the quality of service of a channel before establishing a communication or they can share information about the environment in a cooperative setting like robot soccer players. But there are some shortcomings that emerge in a MAS architecture. The one related to this thesis is that partitioning the task at hand into agents usually entails that agents have less memory or computing power. It is not economically feasible to replicate the big computing unit on each separate agent in our system. Thus we can say that we should think about our agents as computationally bounded , that is, they have a limited amount of computing power to learn from the environment. This has serious implications on the algorithms that are commonly used for learning in these settings. The classical approach for learning in MAS system is to use some variation of a Reinforcement Learning (RL) algorithm [BT96, SB98]. The main idea around those algorithms is that the agent has to maintain a table with the per- ceived value of each action/state pair and through multiple iterations obtain a set of decision rules that allows to take the best action for a given environment. This approach has several flaws when the current action depends on a single observation seen in the past (for instance, a warning sign that a robot per- ceives). Several techniques has been proposed to alleviate those shortcomings. For instance to avoid the combinatorial explosion of states and actions, instead of storing a table with the value of the pairs an approximating function like a neural network can be used instead. And for events in the past, we can extend the state definition of the environment creating dummy states that correspond to the N-tuple (stateN, stateN−1, . . . , stateN−t

Forgiver Triumphs in Alternating Prisoner's Dilemma

Author: Chatterjee Krishnendu
Nowak Martin A.
Reiter Johannes G.
Zagorsky Benjamin M.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Cooperative behavior, where one individual incurs a cost to help another, is a wide spread phenomenon. Here we study direct reciprocity in the context of the alternating Prisoner's Dilemma. We consider all strategies that can be implemented by one and two-state automata. We calculate the payoff matrix of all pairwise encounters in the presence of noise. We explore deterministic selection dynamics with and without mutation. Using different error rates and payoff values, we observe convergence to a small number of distinct equilibria. Two of them are uncooperative strict Nash equilibria representing always-defect (ALLD) and Grim. The third equilibrium is mixed and represents a cooperative alliance of several strategies, dominated by a strategy which we call Forgiver. Forgiver cooperates whenever the opponent has cooperated; it defects once when the opponent has defected, but subsequently Forgiver attempts to re-establish cooperation even if the opponent has defected again. Forgiver is not an evolutionarily stable strategy, but the alliance, which it rules, is asymptotically stable. For a wide range of parameter values the most commonly observed outcome is convergence to the mixed equilibrium, dominated by Forgiver. Our results show that although forgiving might incur a short-term loss it can lead to a long-term gain. Forgiveness facilitates stable cooperation in the presence of exploitation and noise

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

FigShare

Comparison of Techniques to Learn Agent Strategies in Adversarial Games

Author: Diane J Cook Box
Shar Whisenhunt
Publication venue
Publication date: 11/04/2020
Field of study

Abstract The focus of this project is to develop methodologies for using machine learning techniques in adversarial robot situations. In particular, we are using multiple robots to play a version of the wumpus world game. In this game, one robot represents the agent and a second robot represents the wumpus. Our goal is for the agent robot to make autonomous decisions that allow it to elude the wumpus, grab the gold and win the game. To achieve this goal, we consult several supervised machine learning algorithms to decide the agent's move. Agent moves are learned from training examples encoding characteristics of the world, the game state, and the predicted wumpus move. In this paper we will compare the performance of a decision tree learner, a naive Bayesian classifier, a backpropagation neural network, and a learning-based belief network on actual wumpus world games

CiteSeerX

Compositional software verification based on game semantics

Author: Dimovski Aleksandar
Publication venue
Publication date
Field of study

One of the major challenges in computer science is to put programming on a firmer mathematical basis, in order to improve the correctness of computer programs. Automatic program verification is acknowledged to be a very hard problem, but current work is reaching the point where at least the foundationalÃ?Â· aspects of the problem can be addressed and it is becoming a part of industrial software development. This thesis presents a semantic framework for verifying safety properties of open sequ;ptial programs. The presentation is focused on an Algol-like programming language that embodies many of the core ingredients of imperative and functional languages and incorporates data abstraction in its syntax. Game semantics is used to obtain a compositional, incremental way of generating accurate models of programs. Model-checking is made possible by giving certain kinds of concrete automata-theoretic representations of the model. A data-abstraction refinement procedure is developed for model-checking safety properties of programs with infinite integer types. The procedure starts by model-checking the most abstract version of the program. If no counterexample, or a genuine one, is found, the procedure terminates. Otherwise, it uses a spurious counterexample to refine the abstraction for the next iteration. Abstraction refinement, assume-guarantee reasoning and the L* algorithm for learning regular languages are combined to yield a procedure for compositional verification. Construction of a global model is avoided using assume-guarantee reasoning and the L* algorithm, by learning assumptions for arbitrary subprograms. An implementation based on the FDR model checker for the CSP process algebra demonstrates practicality of the methods

Warwick Research Archives Portal Repository