14 research outputs found
Segregation Dynamics with Reinforcement Learning and Agent Based Modeling
Societies are complex. Properties of social systems can be explained by the
interplay and weaving of individual actions. Incentives are key to understand
people's choices and decisions. For instance, individual preferences of where
to live may lead to the emergence of social segregation. In this paper, we
combine Reinforcement Learning (RL) with Agent Based Models (ABM) in order to
address the self-organizing dynamics of social segregation and explore the
space of possibilities that emerge from considering different types of
incentives. Our model promotes the creation of interdependencies and
interactions among multiple agents of two different kinds that want to
segregate from each other. For this purpose, agents use Deep Q-Networks to make
decisions based on the rules of the Schelling Segregation model and the
Predator-Prey model. Despite the segregation incentive, our experiments show
that spatial integration can be achieved by establishing interdependencies
among agents of different kinds. They also reveal that segregated areas are
more probable to host older people than diverse areas, which attract younger
ones. Through this work, we show that the combination of RL and ABMs can create
an artificial environment for policy makers to observe potential and existing
behaviors associated to incentives.Comment: 14 pages, 4 figures + supplemental material, in revie
Intrinsic fluctuations of reinforcement learning promote cooperation
In this work, we ask for and answer what makes classical reinforcement
learning cooperative. Cooperating in social dilemma situations is vital for
animals, humans, and machines. While evolutionary theory revealed a range of
mechanisms promoting cooperation, the conditions under which agents learn to
cooperate are contested. Here, we demonstrate which and how individual elements
of the multi-agent learning setting lead to cooperation. Specifically, we
consider the widely used temporal-difference reinforcement learning algorithm
with epsilon-greedy exploration in the classic environment of an iterated
Prisoner's dilemma with one-period memory. Each of the two learning agents
learns a strategy that conditions the following action choices on both agents'
action choices of the last round. We find that next to a high caring for future
rewards, a low exploration rate, and a small learning rate, it is primarily
intrinsic stochastic fluctuations of the reinforcement learning process which
double the final rate of cooperation to up to 80\%. Thus, inherent noise is not
a necessary evil of the iterative learning process. It is a critical asset for
the learning of cooperation. However, we also point out the trade-off between a
high likelihood of cooperative behavior and achieving this in a reasonable
amount of time. Our findings are relevant for purposefully designing
cooperative algorithms and regulating undesired collusive effects.Comment: 9 pages, 4 figure
Simulation in Contexts Involving an Interactive Table and Tangible Objects
International audienceBy using an interactive table, it is possible to interact with several people (decision-makers) in a simultaneous and collaborative way, around the table, during a simulation session. Thanks to the RFID technology with which the table is fitted, it is possible to give tangible objects a unique identity to include and to consider them in the simulation. The paper describes a context model, which takes into consideration the specificities related to interactive tables. The TangiSense interactive table is presented; it is connected to a multi-agent system making it possible to give the table a certain level of adaptation: each tangible object can be associated to an agent which can bring roles to the object (i.e., the roles are the equivalent of a set of behaviors). The multi-agent system proposed in this paper is modeled according to an architecture adapted to the exploitation of tangible and virtual objects during simulation on an interactive table. A case study is presented; it concerns a simulation of road traffic management. The illustrations give an outline of the potentialities of the simulation system as regards the context-awareness aspect, following both the actions of the decision-makers implied in simulation, and the agents composing the road traffic simulation
Differentiable Game Mechanics
Deep learning is built on the foundational guarantee that gradient descent on an objective function converges to local minima. Unfortunately, this guarantee fails in settings, such as generative adversarial nets, that exhibit multiple interacting losses. The behavior of gradient-based methods in games is not well understood and is becoming increasingly important as adversarial and multi-objective architectures proliferate. In this paper, we develop new tools to understand and control the dynamics in n-player differentiable games. The key result is to decompose the game Jacobian into two components. The first, symmetric component, is related to potential games, which reduce to gradient descent on an implicit function. The second, antisymmetric component, relates to Hamiltonian games, a new class of games that obey a conservation law akin to conservation laws in classical mechanical systems. The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in differentiable games. Basic experiments show SGA is competitive with recently proposed algorithms for finding stable fixed points in GANs – while at the same time being applicable to, and having guarantees in, much more general cases
Multi-agent Reinforcement Learning in Sequential Social Dilemmas
Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation
Addressing Environment Non-Stationarity by Repeating Q-learning Updates *
Abstract Q-learning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to optimal policies in Markov decision processes. However, QL exhibits an artifact: in expectation, the effective rate of updating the value of an action depends on the probability of choosing that action. In other words, there is a tight coupling between the learning dynamics and underlying execution policy. This coupling can cause performance degradation in noisy non-stationary environments. Here, we introduce Repeated Update Q-learning (RUQL), a learning algorithm that resolves the undesirable artifact of Q-learning while maintaining simplicity. We analyze the similarities and differences between RUQL, QL, and the closest state-of-the-art algorithms theoretically. Our analysis shows that RUQL maintains the convergence guarantee of QL in stationary environments, while relaxing the coupling between the execution policy and the learning dynamics. Experimental results confirm the theoretical insights and show how RUQL outperforms both QL and the closest state-of-the-art algorithms in noisy non-stationary environments