23 research outputs found
Modelling Spirals of Silence and Echo Chambers by Learning from the Feedback of Others
What are the mechanisms by which groups with certain opinions gain public voice and force others holding a different view into silence? Furthermore, how does social media play into this? Drawing on neuroscientific insights into the processing of social feedback, we develop a theoretical model that allows us to address these questions. In repeated interactions, individuals learn whether their opinion meets public approval and refrain from expressing their standpoint if it is socially sanctioned. In a social network sorted around opinions, an agent forms a distorted impression of public opinion enforced by the communicative activity of the different camps. Even strong majorities can be forced into silence if a minority acts as a cohesive whole. On the other hand, the strong social organisation around opinions enabled by digital platforms favours collective regimes in which opposing voices are expressed and compete for primacy in public. This paper highlights the role that the basic mechanisms of social information processing play in massive computer-mediated interactions on opinions
Cooperation and Reputation Dynamics with Reinforcement Learning
Creating incentives for cooperation is a challenge in natural and artificial
systems. One potential answer is reputation, whereby agents trade the immediate
cost of cooperation for the future benefits of having a good reputation. Game
theoretical models have shown that specific social norms can make cooperation
stable, but how agents can independently learn to establish effective
reputation mechanisms on their own is less understood. We use a simple model of
reinforcement learning to show that reputation mechanisms generate two
coordination problems: agents need to learn how to coordinate on the meaning of
existing reputations and collectively agree on a social norm to assign
reputations to others based on their behavior. These coordination problems
exhibit multiple equilibria, some of which effectively establish cooperation.
When we train agents with a standard Q-learning algorithm in an environment
with the presence of reputation mechanisms, convergence to undesirable
equilibria is widespread. We propose two mechanisms to alleviate this: (i)
seeding a proportion of the system with fixed agents that steer others towards
good equilibria; and (ii), intrinsic rewards based on the idea of
introspection, i.e., augmenting agents' rewards by an amount proportionate to
the performance of their own strategy against themselves. A combination of
these simple mechanisms is successful in stabilizing cooperation, even in a
fully decentralized version of the problem where agents learn to use and assign
reputations simultaneously. We show how our results relate to the literature in
Evolutionary Game Theory, and discuss implications for artificial, human and
hybrid systems, where reputations can be used as a way to establish trust and
cooperation.Comment: Published in AAMAS'21, 9 page
Dynamics of Boltzmann Q-Learning in Two-Player Two-Action Games
We consider the dynamics of Q-learning in two-player two-action games with a
Boltzmann exploration mechanism. For any non-zero exploration rate the dynamics
is dissipative, which guarantees that agent strategies converge to rest points
that are generally different from the game's Nash Equlibria (NE). We provide a
comprehensive characterization of the rest point structure for different games,
and examine the sensitivity of this structure with respect to the noise due to
exploration. Our results indicate that for a class of games with multiple NE
the asymptotic behavior of learning dynamics can undergo drastic changes at
critical exploration rates. Furthermore, we demonstrate that for certain games
with a single NE, it is possible to have additional rest points (not
corresponding to any NE) that persist for a finite range of the exploration
rates and disappear when the exploration rates of both players tend to zero.Comment: 10 pages, 12 figures. Version 2: added more extensive discussion of
asymmetric equilibria; clarified conditions for continuous/discontinuous
bifurcations in coordination/anti-coordination game
Continuous Strategy Replicator Dynamics for Multi--Agent Learning
The problem of multi-agent learning and adaptation has attracted a great deal
of attention in recent years. It has been suggested that the dynamics of multi
agent learning can be studied using replicator equations from population
biology. Most existing studies so far have been limited to discrete strategy
spaces with a small number of available actions. In many cases, however, the
choices available to agents are better characterized by continuous spectra.
This paper suggests a generalization of the replicator framework that allows to
study the adaptive dynamics of Q-learning agents with continuous strategy
spaces. Instead of probability vectors, agents strategies are now characterized
by probability measures over continuous variables. As a result, the ordinary
differential equations for the discrete case are replaced by a system of
coupled integral--differential replicator equations that describe the mutual
evolution of individual agent strategies. We derive a set of functional
equations describing the steady state of the replicator dynamics, examine their
solutions for several two-player games, and confirm our analytical results
using simulations.Comment: 12 pages, 15 figures, accepted for publication in JAAMA
Pigouvian algorithmic platform design
There are rising concerns that reinforcement algorithms might learn tacit collusion in oligopolistic pricing, and moreover that the resulting âblack boxâ strategies would be difficult to regulate. Here, I exploit a strong connection between evolutionary game theory and reinforcement learning to show when the latterâs rest points are BayesâNash equilibria, but also to derive a system of Pigouvian taxes guaranteed to implement an (unknown) socially optimal outcome of an oligopoly pricing game. Finally, I illustrate reinforcement learning of equilibrium play via simulation, which provides evidence of the capacity of reinforcement algorithms to collude in a very simple setting, but the introduction of the optimal tax scheme induces a competitive outcome
Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments
The Game Theory & Multi-Agent team at DeepMind studies several aspects of
multi-agent learning ranging from computing approximations to fundamental
concepts in game theory to simulating social dilemmas in rich spatial
environments and training 3-d humanoids in difficult team coordination tasks. A
signature aim of our group is to use the resources and expertise made available
to us at DeepMind in deep reinforcement learning to explore multi-agent systems
in complex environments and use these benchmarks to advance our understanding.
Here, we summarise the recent work of our team and present a taxonomy that we
feel highlights many important open challenges in multi-agent research.Comment: Published in AI Communications 202
The Stabilisation of Equilibria in Evolutionary Game Dynamics through Mutation: Mutation Limits in Evolutionary Games
The multi-population replicator dynamics is a dynamic approach to coevolving populations and multi-player games and is related to Cross learning. In general, not every equilibrium is a Nash equilibrium of the underlying game, and the convergence is not guaranteed. In particular, no interior equilibrium can be asymptotically stable in the multi-population replicator dynamics, e.g. resulting in cyclic orbits around a single interior Nash equilibrium. We introduce a new notion of equilibria of replicator dynamics, called mutation limits, based on a naturally arising, simple form of mutation, which is invariant under the specific choice of mutation parameters. We prove the existence of mutation limits for a large class of games, and consider a particularly interesting subclass called attracting mutation limits. Attracting mutation limits are approximated in every (mutation-)perturbed replicator dynamics, hence they offer an approximate dynamic solution to the underlying game even if the original dynamic is not convergent. Thus, mutation stabilizes the system in certain cases and makes attracting mutation limits near attainable. Hence, attracting mutation limits are relevant as a dynamic solution concept of games. We observe that they have some similarity to Q-learning in multi-agent reinforcement learning. Attracting mutation limits do not exist in all games, however, raising the question of their characterization
Bounds and dynamics for empirical game theoretic analysis
This paper provides several theoretical results for empirical game theory. Specifically, we introduce bounds for empirical game theoretical analysis of complex multi-agent interactions. In doing so we provide insights in the empirical meta game showing that a Nash equilibrium of the estimated meta-game is an approximate Nash equilibrium of the true underlying meta-game. We investigate and show how many data samples are required to obtain a close enough approximation of the underlying game. Additionally, we extend the evolutionary dynamics analysis of meta-games using heuristic payoff tables (HPTs) to asymmetric games. The state-of-the-art has only considered evolutionary dynamics of symmetric HPTs in which agents have access to the same strategy sets and the payoff structure is symmetric, implying that agents are interchangeable. Finally, we carry out an empirical illustration of the generalised method in several domains, illustrating the theory and evolutionary dynamics of several versions of the AlphaGo algorithm (symmetric), the dynamics of the Colonel Blotto game played by human players on Facebook (symmetric), the dynamics of several teams of players in the capture the flag game (symmetric), and an example of a meta-game in Leduc Poker (asymmetric), generated by the policy-space response oracle multi-agent learning algorithm