Search CORE

14 research outputs found

Segregation Dynamics with Reinforcement Learning and Agent Based Modeling

Author: Bar-Yam Yaneer
Morales Alfredo J.
Sert Egemen
Publication venue
Publication date: 18/09/2019
Field of study

Societies are complex. Properties of social systems can be explained by the interplay and weaving of individual actions. Incentives are key to understand people's choices and decisions. For instance, individual preferences of where to live may lead to the emergence of social segregation. In this paper, we combine Reinforcement Learning (RL) with Agent Based Models (ABM) in order to address the self-organizing dynamics of social segregation and explore the space of possibilities that emerge from considering different types of incentives. Our model promotes the creation of interdependencies and interactions among multiple agents of two different kinds that want to segregate from each other. For this purpose, agents use Deep Q-Networks to make decisions based on the rules of the Schelling Segregation model and the Predator-Prey model. Despite the segregation incentive, our experiments show that spatial integration can be achieved by establishing interdependencies among agents of different kinds. They also reveal that segregated areas are more probable to host older people than diverse areas, which attract younger ones. Through this work, we show that the combination of RL and ABMs can create an artificial environment for policy makers to observe potential and existing behaviors associated to incentives.Comment: 14 pages, 4 figures + supplemental material, in revie

arXiv.org e-Print Archive

OpenMETU (Middle East Technical University)

Intrinsic fluctuations of reinforcement learning promote cooperation

Author: Barfuss Wolfram
Meylahn Janusz M.
Publication venue
Publication date: 01/09/2022
Field of study

In this work, we ask for and answer what makes classical reinforcement learning cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. Specifically, we consider the widely used temporal-difference reinforcement learning algorithm with epsilon-greedy exploration in the classic environment of an iterated Prisoner's dilemma with one-period memory. Each of the two learning agents learns a strategy that conditions the following action choices on both agents' action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80\%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects.Comment: 9 pages, 4 figure

arXiv.org e-Print Archive

University of Twente Research Information

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Simulation in Contexts Involving an Interactive Table and Tangible Objects

Author: Adam Emmanuel
Kolski Christophe
Kubicki Sébastien
Lebrun Yoann
Lepreux Sophie
Mandiau René
Publication venue: 'Elsevier BV'
Publication date: 01/02/2013
Field of study

International audienceBy using an interactive table, it is possible to interact with several people (decision-makers) in a simultaneous and collaborative way, around the table, during a simulation session. Thanks to the RFID technology with which the table is fitted, it is possible to give tangible objects a unique identity to include and to consider them in the simulation. The paper describes a context model, which takes into consideration the specificities related to interactive tables. The TangiSense interactive table is presented; it is connected to a multi-agent system making it possible to give the table a certain level of adaptation: each tangible object can be associated to an agent which can bring roles to the object (i.e., the roles are the equivalent of a set of behaviors). The multi-agent system proposed in this paper is modeled according to an architecture adapted to the exploitation of tangible and virtual objects during simulation on an interactive table. A case study is presented; it concerns a simulation of road traffic management. The illustrations give an outline of the potentialities of the simulation system as regards the context-awareness aspect, following both the actions of the decision-makers implied in simulation, and the agents composing the road traffic simulation

Crossref

HAL-Université de Bretagne Occidentale

Differentiable Game Mechanics

Author: Balduzzi D
Foerster J
Graepel T
Letcher A
Martens J
Racaniere S
Tuyls K
Publication venue: MICROTOME PUBL
Publication date: 20/01/2019
Field of study

Deep learning is built on the foundational guarantee that gradient descent on an objective function converges to local minima. Unfortunately, this guarantee fails in settings, such as generative adversarial nets, that exhibit multiple interacting losses. The behavior of gradient-based methods in games is not well understood and is becoming increasingly important as adversarial and multi-objective architectures proliferate. In this paper, we develop new tools to understand and control the dynamics in n-player differentiable games. The key result is to decompose the game Jacobian into two components. The first, symmetric component, is related to potential games, which reduce to gradient descent on an implicit function. The second, antisymmetric component, relates to Hamiltonian games, a new class of games that obey a conservation law akin to conservation laws in classical mechanical systems. The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in differentiable games. Basic experiments show SGA is competitive with recently proposed algorithms for finding stable fixed points in GANs – while at the same time being applicable to, and having guarantees in, much more general cases

UCL Discovery

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

Author: Graepel T
Lanctot M
Leibo JZ
Marecki J
Zambaldi VF
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 09/02/2017
Field of study

Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation

arXiv.org e-Print Archive

UCL Discovery

Addressing Environment Non-Stationarity by Repeating Q-learning Updates *

Author: Jan Peters
Michael Kaisers
Shario@ieee Org
Sherief Abdallah
Publication venue
Publication date: 01/01/2016
Field of study

Abstract Q-learning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to optimal policies in Markov decision processes. However, QL exhibits an artifact: in expectation, the effective rate of updating the value of an action depends on the probability of choosing that action. In other words, there is a tight coupling between the learning dynamics and underlying execution policy. This coupling can cause performance degradation in noisy non-stationary environments. Here, we introduce Repeated Update Q-learning (RUQL), a learning algorithm that resolves the undesirable artifact of Q-learning while maintaining simplicity. We analyze the similarities and differences between RUQL, QL, and the closest state-of-the-art algorithms theoretically. Our analysis shows that RUQL maintains the convergence guarantee of QL in stationary environments, while relaxing the coupling between the execution policy and the learning dynamics. Experimental results confirm the theoretical insights and show how RUQL outperforms both QL and the closest state-of-the-art algorithms in noisy non-stationary environments

CiteSeerX