Search CORE

115 research outputs found

Reinforcement Learning Produces Dominant Strategies for the Iterated Prisoner's Dilemma

Author: Campbell Owen
Glynatsi Nikoleta E.
Harper Marc
Jones Martin
Knight Vincent
Koutsovoulos Georgios
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 19/07/2017
Field of study

We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also

arXiv.org e-Print Archive

Online Research @ Cardiff

Directory of Open Access Journals

ProdInra

Foresighted policy gradient reinforcement learning: solving large-scale social dilemmas with rational altruistic punishment

Author: Bohte S.M. (Sander)
Hoen P.J. (Pieter Jan) 't
Poutré J.A. (Han) La
Publication venue: CWI
Publication date: 01/10/2008
Field of study

Many important and difficult problems can be modeled as “social dilemmas”, like Hardin's Tragedy of the Commons or the classic iterated Prisoner's Dilemma. It is well known that in these problems, it can be rational for self-interested agents to promote and sustain cooperation by altruistically dispensing costly punishment to other agents, thus maximizing their own long-term reward. However, self-interested agents using most current multi-agent reinforcement learning algorithms will not sustain cooperation in social dilemmas: the algorithms do not sufficiently capture the consequences on the agent's reward of the interactions that it has with other agents. Recent more foresighted algorithms specifically account for such expected consequences, and have been shown to work well for the small-scale Prisoner's Dilemma. However, this approach quickly becomes intractable for larger social dilemmas. Here, we advance on this work and develop a “teach/learn” stateless foresighted policy gradient reinforcement learning algorithm that applies to Social Dilemma's with negative, unilateral side-payments, in the from of costly punishment. In this setting, the algorithm allows agents to learn the most rewarding actions to take with respect to both the dilemma (Cooperate/Defect) and the “teaching” of other agent's behavior through the dispensing of punishment. Unlike other algorithms, we show that this approach scales well to large settings like the Tragedy of the Commons. We show for a variety of settings that large groups of self-interested agents using this algorithm will robustly find and sustain cooperation in social dilemmas where adaptive agents can punish the behavior of other similarly adaptive agents

CWI's Institutional Repository

Why mutual helping in most natural systems is neither conflict-free nor based on maximal conflict

Author: Bshary Redouan
van Schaik Carel
Zuberbuhler Klaus
Publication venue: 'The Royal Society'
Publication date: 01/01/2016
Field of study

Funding: All authors are funded by individual grants from the Swiss Science Foundation.Mutual helping for direct benefits can be explained by various game theoretical models, which differ mainly in terms of the underlying conflict of interest between two partners. Conflict is minimal if helping is self-serving and the partner benefits as a by-product. In contrast, conflict is maximal if partners are in a prisoner’s dilemma with both having the payoff-dominant option of not returning the other’s investment. Here, we provide evolutionary and ecological arguments for why these two extremes are often unstable under natural conditions and propose that interactions with intermediate levels of conflict are frequent evolutionary endpoints. We argue that by product helping is prone to becoming an asymmetric investment game since even small variation in by-product benefits will lead to the evolution of partner choice, leading to investments and partner monitoring. Second, iterated prisoner’s dilemmas tend to take place in stable social groups where the fitness of partners is interdependent, to the effect that a certain level of helping is self-serving. In sum, intermediate levels of mutual helping are expected in nature, while efficient partner control mechanisms may allow reaching higher levels.PostprintPeer reviewe

PubMed Central

ZORA

University of St. Andrews - Pure

St Andrews Research Repository

Catgame: A Tool For Problem Solving In Complex Dynamic Systems Using Game Theoretic Knowledge Distribution In Cultural Algorithms, And Its Application (catneuro) To The Deep Learning Of Game Controller

Author: Waris Faisal
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2020
Field of study

Cultural Algorithms (CA) are knowledge-intensive, population-based stochastic optimization methods that are modeled after human cultures and are suited to solving problems in complex environments. The CA Belief Space stores knowledge harvested from prior generations and re-distributes it to future generations via a knowledge distribution (KD) mechanism. Each of the population individuals is then guided through the search space via the associated knowledge. Previously, CA implementations have used only competitive KD mechanisms that have performed well for problems embedded in static environments. Relatively recently, CA research has evolved to encompass dynamic problem environments. Given increasing environmental complexity, a natural question arises about whether KD mechanisms that also incorporate cooperation can perform better in such environments than purely competitive ones? Borrowing from game theory, game-based KD mechanisms are implemented and tested against the default competitive mechanism – Weighted Majority (WTD). Two different concepts of complexity are addressed – numerical optimization under dynamic environments and hierarchal, multi-objective optimization for evolving deep learning models. The former is addressed with the CATGame software system and the later with CATNeuro. CATGame implements three types of games that span both cooperation and competition for knowledge distribution, namely: Iterated Prisoner\u27s Dilemma (IPD), Stag-Hunt and Stackelberg. The performance of the three game mechanisms is compared with the aid of a dynamic problem generator called Cones World. Weighted Majority, aka “wisdom of the crowd”, the default CA competitive KD mechanism is used as the benchmark. It is shown that games that support both cooperation and competition do indeed perform better but not in all cases. The results shed light on what kinds of games are suited to problem solving in complex, dynamic environments. Specifically, games that balance exploration and exploitation using the local signal of ‘social’ rank – Stag-Hunt and IPD – perform better. Stag-Hunt which is also the most cooperative of the games tested, performed the best overall. Dynamic analysis of the ‘social’ aspects of the CA test runs shows that Stag-Hunt allocates compute resources more consistently than the others in response to environmental complexity changes. Stackelberg where the allocation decisions are centralized, like in a centrally planned economic system, is found to be the least adaptive. CATNeuro is for solving neural architecture search (NAS) problems. Contemporary ‘deep learning’ neural network models are proven effective. However, the network topologies may be complex and not immediately obvious for the problem at hand. This has given rise to the secondary field of neural architecture search. It is still nascent with many frameworks and approaches now becoming available. This paper describes a NAS method based on graph evolution pioneered by NEAT (Neuroevolution of Augmenting Topologies) but driven by the evolutionary mechanisms under Cultural Algorithms. Here CATNeuro is applied to find optimal network topologies to play a 2D fighting game called FightingICE (derived from “The Rumble Fish” video game). A policy-based, reinforcement learning method is used to create the training data for network optimization. CATNeuro is still evolving. To inform the development of CATNeuro, in this primary foray into NAS, we contrast the performance of CATNeuro with two different knowledge distribution mechanisms – the stalwart Weighted Majority and a new one based on the Stag-Hunt game from evolutionary game theory that performed the best in CATGame. The research shows that Stag-Hunt has a distinct edge over WTD in terms of game performance, model accuracy, and model size. It is therefore deemed to be the preferred mechanism for complex, hierarchical optimization tasks such as NAS and is planned to be used as the default KD mechanism in CATNeuro going forward

Digital Commons@Wayne State University

INVESTIGATIONS INTO THE COGNITIVE ABILITIES OF ALTERNATE LEARNING CLASSIFIER SYSTEM ARCHITECTURES

Author: Gaines David Alexander
Publication venue: UKnowledge
Publication date: 01/01/2006
Field of study

The Learning Classifier System (LCS) and its descendant, XCS, are promising paradigms for machine learning design and implementation. Whereas LCS allows classifier payoff predictions to guide system performance, XCS focuses on payoff-prediction accuracy instead, allowing it to evolve optimal classifier sets in particular applications requiring rational thought. This research examines LCS and XCS performance in artificial situations with broad social/commercial parallels, created using the non-Markov Iterated Prisoner\u27s Dilemma (IPD) game-playing scenario, where the setting is sometimes asymmetric and where irrationality sometimes pays. This research systematically perturbs a conventional IPD-playing LCS-based agent until it results in a full-fledged XCS-based agent, contrasting the simulated behavior of each LCS variant in terms of a number of performance measures. The intent is to examine the XCS paradigm to understand how it better copes with a given situation (if it does) than the LCS perturbations studied.Experiment results indicate that the majority of the architectural differences do have a significant effect on the agents\u27 performance with respect to the performance measures used in this research. The results of these competitions indicate that while each architectural difference significantly affected its agent\u27s performance, no single architectural difference could be credited as causing XCS\u27s demonstrated superiority in evolving optimal populations. Instead, the data suggests that XCS\u27s ability to evolve optimal populations in the multiplexer and IPD problem domains result from the combined and synergistic effects of multiple architectural differences.In addition, it is demonstrated that XCS is able to reliably evolve the Optimal Population [O] against the TFT opponent. This result supports Kovacs\u27 Optimality Hypothesis in the IPD environment and is significant because it is the first demonstrated occurrence of this ability in an environment other than the multiplexer and Woods problem domains.It is therefore apparent that while XCS performs better than its LCS-based counterparts, its demonstrated superiority may not be attributed to a single architectural characteristic. Instead, XCS\u27s ability to evolve optimal classifier populations in the multiplexer problem domain and in the IPD problem domain studied in this research results from the combined and synergistic effects of multiple architectural differences

University of Kentucky

Aprendizado de máquina e o dilema dos prisioneiros

Author: Santana Jonas Cardoso Carvalho
Publication venue
Publication date: 09/12/2019
Field of study

Trabalho de Conclusão de Curso (graduação)—Universidade de Brasília, Departamento de Economia, 2019.Dado o grande crescimento da área de aprendizado de máquina, buscamos verificar o desempenho da aplicação de seus métodos na confecção de estratégias frente o caso do Dilema dos Prisioneiros iterado. Também buscamos verificar se há correlação entre esse e o desempenho no Processo de Moran, utilizado para análise intertemporal de populações.Given the great growth of the machine learning area, we seek to verify the performance of the application of its methods in strategy making in the case of the iterated Prisoners' Dilemma. We also sought to verify if there is a correlation between this and the performance in the Moran Process, used for intertemporal analysis of populations

Biblioteca Digital de Monografias

Unemployment Insurance and the Evolution of Worker-Employer\n Cooperation: Experiments with Real and Artificial Agents

Author: Mark Pingle and Leigh Tesfatsion
Publication venue
Publication date
Field of study

This paper reports the results of human subject and computational experiments designed to examine how the level of the "inactivity payments" to workers and to employers affects the evolution of cooperation among workers and employers. The related impacts to unemployment and job vacancy rates are our primary focus. However, we also examine the impacts on labor force participation, productive efficiency, the willingness to form long term relationships, and other outcome measures.Agent-based computational economics; Labor market; Unemployment\n benefits; Evolution of cooperation; Adaptive search

Research Papers in Economics