208 research outputs found
Learning to resolve social dilemmas: a survey
Social dilemmas are situations of inter-dependent decision making in which individual rationality can lead to outcomes with poor social qualities. The ubiquity of social dilemmas in social, biological, and computational systems has generated substantial research across these diverse disciplines into the study of mechanisms for avoiding deficient outcomes by promoting and maintaining mutual cooperation. Much of this research is focused on studying how individuals faced with a dilemma can learn to cooperate by adapting their behaviours according to their past experience. In particular, three types of learning approaches have been studied: evolutionary game-theoretic learning, reinforcement learning, and best-response learning. This article is a comprehensive integrated survey of these learning approaches in the context of dilemma games. We formally introduce dilemma games and their inherent challenges. We then outline the three learning approaches and, for each approach, provide a survey of the solutions proposed for dilemma resolution. Finally, we provide a comparative summary and discuss directions in which further research is needed
A multi-agent reinforcement learning model of common-pool resource appropriation
Humanity faces numerous problems of common-pool resource appropriation. This class of multi-agent social dilemma includes the problems of ensuring sustainable use of fresh water, common fisheries, grazing pastures, and irrigation systems. Abstract models of common-pool resource appropriation based on non-cooperative game theory predict that self-interested agents will generally fail to find socially positive equilibria---a phenomenon called the tragedy of the commons. However, in reality, human societies are sometimes able to discover and implement stable cooperative solutions. Decades of behavioral game theory research have sought to uncover aspects of human behavior that make this possible. Most of that work was based on laboratory experiments where participants only make a single choice: how much to appropriate. Recognizing the importance of spatial and temporal resource dynamics, a recent trend has been toward experiments in more complex real-time video game-like environments. However, standard methods of non-cooperative game theory can no longer be used to generate predictions for this case. Here we show that deep reinforcement learning can be used instead. To that end, we study the emergent behavior of groups of independently learning agents in a partially observed Markov game modeling common-pool resource appropriation. Our experiments highlight the importance of trial-and-error learning in common-pool resource appropriation and shed light on the relationship between exclusion, sustainability, and inequality
Learning and innovative elements of strategy adoption rules expand cooperative network topologies
Cooperation plays a key role in the evolution of complex systems. However,
the level of cooperation extensively varies with the topology of agent networks
in the widely used models of repeated games. Here we show that cooperation
remains rather stable by applying the reinforcement learning strategy adoption
rule, Q-learning on a variety of random, regular, small-word, scale-free and
modular network models in repeated, multi-agent Prisoners Dilemma and Hawk-Dove
games. Furthermore, we found that using the above model systems other long-term
learning strategy adoption rules also promote cooperation, while introducing a
low level of noise (as a model of innovation) to the strategy adoption rules
makes the level of cooperation less dependent on the actual network topology.
Our results demonstrate that long-term learning and random elements in the
strategy adoption rules, when acting together, extend the range of network
topologies enabling the development of cooperation at a wider range of costs
and temptations. These results suggest that a balanced duo of learning and
innovation may help to preserve cooperation during the re-organization of
real-world networks, and may play a prominent role in the evolution of
self-organizing, complex systems.Comment: 14 pages, 3 Figures + a Supplementary Material with 25 pages, 3
Tables, 12 Figures and 116 reference
Multi-agent Reinforcement Learning in Sequential Social Dilemmas
Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation
A Review of Platforms for the Development of Agent Systems
Agent-based computing is an active field of research with the goal of
building autonomous software of hardware entities. This task is often
facilitated by the use of dedicated, specialized frameworks. For almost thirty
years, many such agent platforms have been developed. Meanwhile, some of them
have been abandoned, others continue their development and new platforms are
released. This paper presents a up-to-date review of the existing agent
platforms and also a historical perspective of this domain. It aims to serve as
a reference point for people interested in developing agent systems. This work
details the main characteristics of the included agent platforms, together with
links to specific projects where they have been used. It distinguishes between
the active platforms and those no longer under development or with unclear
status. It also classifies the agent platforms as general purpose ones, free or
commercial, and specialized ones, which can be used for particular types of
applications.Comment: 40 pages, 2 figures, 9 tables, 83 reference
Artificial virtuous agents in a multi‐agent tragedy of the commons
Although virtue ethics has repeatedly been proposed as a suitable framework for the development of artificial moral agents (AMAs), it has been proven difficult to approach from a computational perspective. In this work, we present the first technical implementation of artificial virtuous agents (AVAs) in moral simulations. First, we review previous conceptual and technical work in artificial virtue ethics and describe a functionalistic path to AVAs based on dispositional virtues, bottom-up learning, and top-down eudaimonic reward. We then provide the details of a technical implementation in a moral simulation based on a tragedy of the commons scenario. The experimental results show how the AVAs learn to tackle cooperation problems while exhibiting core features of their theoretical counterpart, including moral character, dispositional virtues, learning from experience, and the pursuit of eudaimonia. Ultimately, we argue that virtue ethics provides a compelling path toward morally excellent machines and that our work provides an important starting point for such endeavors
SMIX(): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning
Learning a stable and generalizable centralized value function (CVF) is a
crucial but challenging task in multi-agent reinforcement learning (MARL), as
it has to deal with the issue that the joint action space increases
exponentially with the number of agents in such scenarios. This paper proposes
an approach, named SMIX(), to address the issue using an efficient
off-policy centralized training method within a flexible learner search space.
As importance sampling for such off-policy training is both computationally
costly and numerically unstable, we proposed to use the -return as a
proxy to compute the TD error. With this new loss function objective, we adopt
a modified QMIX network structure as the base to train our model. By further
connecting it with the approach from an unified expectation
correction viewpoint, we show that the proposed SMIX() is equivalent
to and hence shares its convergence properties, while without
being suffered from the aforementioned curse of dimensionality problem inherent
in MARL. Experiments on the StarCraft Multi-Agent Challenge (SMAC) benchmark
demonstrate that our approach not only outperforms several state-of-the-art
MARL methods by a large margin, but also can be used as a general tool to
improve the overall performance of other CTDE-type algorithms by enhancing
their CVFs
- …