115 research outputs found

    Reinforcement Learning Produces Dominant Strategies for the Iterated Prisoner's Dilemma

    Get PDF
    We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also

    Foresighted policy gradient reinforcement learning: solving large-scale social dilemmas with rational altruistic punishment

    Get PDF
    Many important and difficult problems can be modeled as “social dilemmas”, like Hardin's Tragedy of the Commons or the classic iterated Prisoner's Dilemma. It is well known that in these problems, it can be rational for self-interested agents to promote and sustain cooperation by altruistically dispensing costly punishment to other agents, thus maximizing their own long-term reward. However, self-interested agents using most current multi-agent reinforcement learning algorithms will not sustain cooperation in social dilemmas: the algorithms do not sufficiently capture the consequences on the agent's reward of the interactions that it has with other agents. Recent more foresighted algorithms specifically account for such expected consequences, and have been shown to work well for the small-scale Prisoner's Dilemma. However, this approach quickly becomes intractable for larger social dilemmas. Here, we advance on this work and develop a “teach/learn” stateless foresighted policy gradient reinforcement learning algorithm that applies to Social Dilemma's with negative, unilateral side-payments, in the from of costly punishment. In this setting, the algorithm allows agents to learn the most rewarding actions to take with respect to both the dilemma (Cooperate/Defect) and the “teaching” of other agent's behavior through the dispensing of punishment. Unlike other algorithms, we show that this approach scales well to large settings like the Tragedy of the Commons. We show for a variety of settings that large groups of self-interested agents using this algorithm will robustly find and sustain cooperation in social dilemmas where adaptive agents can punish the behavior of other similarly adaptive agents

    Why mutual helping in most natural systems is neither conflict-free nor based on maximal conflict

    Get PDF
    Funding: All authors are funded by individual grants from the Swiss Science Foundation.Mutual helping for direct benefits can be explained by various game theoretical models, which differ mainly in terms of the underlying conflict of interest between two partners. Conflict is minimal if helping is self-serving and the partner benefits as a by-product. In contrast, conflict is maximal if partners are in a prisoner’s dilemma with both having the payoff-dominant option of not returning the other’s investment. Here, we provide evolutionary and ecological arguments for why these two extremes are often unstable under natural conditions and propose that interactions with intermediate levels of conflict are frequent evolutionary endpoints. We argue that by product helping is prone to becoming an asymmetric investment game since even small variation in by-product benefits will lead to the evolution of partner choice, leading to investments and partner monitoring. Second, iterated prisoner’s dilemmas tend to take place in stable social groups where the fitness of partners is interdependent, to the effect that a certain level of helping is self-serving. In sum, intermediate levels of mutual helping are expected in nature, while efficient partner control mechanisms may allow reaching higher levels.PostprintPeer reviewe

    Catgame: A Tool For Problem Solving In Complex Dynamic Systems Using Game Theoretic Knowledge Distribution In Cultural Algorithms, And Its Application (catneuro) To The Deep Learning Of Game Controller

    Get PDF
    Cultural Algorithms (CA) are knowledge-intensive, population-based stochastic optimization methods that are modeled after human cultures and are suited to solving problems in complex environments. The CA Belief Space stores knowledge harvested from prior generations and re-distributes it to future generations via a knowledge distribution (KD) mechanism. Each of the population individuals is then guided through the search space via the associated knowledge. Previously, CA implementations have used only competitive KD mechanisms that have performed well for problems embedded in static environments. Relatively recently, CA research has evolved to encompass dynamic problem environments. Given increasing environmental complexity, a natural question arises about whether KD mechanisms that also incorporate cooperation can perform better in such environments than purely competitive ones? Borrowing from game theory, game-based KD mechanisms are implemented and tested against the default competitive mechanism – Weighted Majority (WTD). Two different concepts of complexity are addressed – numerical optimization under dynamic environments and hierarchal, multi-objective optimization for evolving deep learning models. The former is addressed with the CATGame software system and the later with CATNeuro. CATGame implements three types of games that span both cooperation and competition for knowledge distribution, namely: Iterated Prisoner\u27s Dilemma (IPD), Stag-Hunt and Stackelberg. The performance of the three game mechanisms is compared with the aid of a dynamic problem generator called Cones World. Weighted Majority, aka “wisdom of the crowd”, the default CA competitive KD mechanism is used as the benchmark. It is shown that games that support both cooperation and competition do indeed perform better but not in all cases. The results shed light on what kinds of games are suited to problem solving in complex, dynamic environments. Specifically, games that balance exploration and exploitation using the local signal of ‘social’ rank – Stag-Hunt and IPD – perform better. Stag-Hunt which is also the most cooperative of the games tested, performed the best overall. Dynamic analysis of the ‘social’ aspects of the CA test runs shows that Stag-Hunt allocates compute resources more consistently than the others in response to environmental complexity changes. Stackelberg where the allocation decisions are centralized, like in a centrally planned economic system, is found to be the least adaptive. CATNeuro is for solving neural architecture search (NAS) problems. Contemporary ‘deep learning’ neural network models are proven effective. However, the network topologies may be complex and not immediately obvious for the problem at hand. This has given rise to the secondary field of neural architecture search. It is still nascent with many frameworks and approaches now becoming available. This paper describes a NAS method based on graph evolution pioneered by NEAT (Neuroevolution of Augmenting Topologies) but driven by the evolutionary mechanisms under Cultural Algorithms. Here CATNeuro is applied to find optimal network topologies to play a 2D fighting game called FightingICE (derived from “The Rumble Fish” video game). A policy-based, reinforcement learning method is used to create the training data for network optimization. CATNeuro is still evolving. To inform the development of CATNeuro, in this primary foray into NAS, we contrast the performance of CATNeuro with two different knowledge distribution mechanisms – the stalwart Weighted Majority and a new one based on the Stag-Hunt game from evolutionary game theory that performed the best in CATGame. The research shows that Stag-Hunt has a distinct edge over WTD in terms of game performance, model accuracy, and model size. It is therefore deemed to be the preferred mechanism for complex, hierarchical optimization tasks such as NAS and is planned to be used as the default KD mechanism in CATNeuro going forward

    INVESTIGATIONS INTO THE COGNITIVE ABILITIES OF ALTERNATE LEARNING CLASSIFIER SYSTEM ARCHITECTURES

    Get PDF
    The Learning Classifier System (LCS) and its descendant, XCS, are promising paradigms for machine learning design and implementation. Whereas LCS allows classifier payoff predictions to guide system performance, XCS focuses on payoff-prediction accuracy instead, allowing it to evolve optimal classifier sets in particular applications requiring rational thought. This research examines LCS and XCS performance in artificial situations with broad social/commercial parallels, created using the non-Markov Iterated Prisoner\u27s Dilemma (IPD) game-playing scenario, where the setting is sometimes asymmetric and where irrationality sometimes pays. This research systematically perturbs a conventional IPD-playing LCS-based agent until it results in a full-fledged XCS-based agent, contrasting the simulated behavior of each LCS variant in terms of a number of performance measures. The intent is to examine the XCS paradigm to understand how it better copes with a given situation (if it does) than the LCS perturbations studied.Experiment results indicate that the majority of the architectural differences do have a significant effect on the agents\u27 performance with respect to the performance measures used in this research. The results of these competitions indicate that while each architectural difference significantly affected its agent\u27s performance, no single architectural difference could be credited as causing XCS\u27s demonstrated superiority in evolving optimal populations. Instead, the data suggests that XCS\u27s ability to evolve optimal populations in the multiplexer and IPD problem domains result from the combined and synergistic effects of multiple architectural differences.In addition, it is demonstrated that XCS is able to reliably evolve the Optimal Population [O] against the TFT opponent. This result supports Kovacs\u27 Optimality Hypothesis in the IPD environment and is significant because it is the first demonstrated occurrence of this ability in an environment other than the multiplexer and Woods problem domains.It is therefore apparent that while XCS performs better than its LCS-based counterparts, its demonstrated superiority may not be attributed to a single architectural characteristic. Instead, XCS\u27s ability to evolve optimal classifier populations in the multiplexer problem domain and in the IPD problem domain studied in this research results from the combined and synergistic effects of multiple architectural differences

    Aprendizado de máquina e o dilema dos prisioneiros

    Get PDF
    Trabalho de Conclusão de Curso (graduação)—Universidade de Brasília, Departamento de Economia, 2019.Dado o grande crescimento da área de aprendizado de máquina, buscamos verificar o desempenho da aplicação de seus métodos na confecção de estratégias frente o caso do Dilema dos Prisioneiros iterado. Também buscamos verificar se há correlação entre esse e o desempenho no Processo de Moran, utilizado para análise intertemporal de populações.Given the great growth of the machine learning area, we seek to verify the performance of the application of its methods in strategy making in the case of the iterated Prisoners' Dilemma. We also sought to verify if there is a correlation between this and the performance in the Moran Process, used for intertemporal analysis of populations

    Unemployment Insurance and the Evolution of Worker-Employer\n Cooperation: Experiments with Real and Artificial Agents

    Get PDF
    This paper reports the results of human subject and computational experiments designed to examine how the level of the "inactivity payments" to workers and to employers affects the evolution of cooperation among workers and employers. The related impacts to unemployment and job vacancy rates are our primary focus. However, we also examine the impacts on labor force participation, productive efficiency, the willingness to form long term relationships, and other outcome measures.Agent-based computational economics; Labor market; Unemployment\n benefits; Evolution of cooperation; Adaptive search

    Signaling Discount Rates: Law, Norms, and Economic Methodology

    Get PDF
    corecore