    Modelling Spirals of Silence and Echo Chambers by Learning from the Feedback of Others

    What are the mechanisms by which groups with certain opinions gain public voice and force others holding a different view into silence? Furthermore, how does social media play into this? Drawing on neuroscientific insights into the processing of social feedback, we develop a theoretical model that allows us to address these questions. In repeated interactions, individuals learn whether their opinion meets public approval and refrain from expressing their standpoint if it is socially sanctioned. In a social network sorted around opinions, an agent forms a distorted impression of public opinion enforced by the communicative activity of the different camps. Even strong majorities can be forced into silence if a minority acts as a cohesive whole. On the other hand, the strong social organisation around opinions enabled by digital platforms favours collective regimes in which opposing voices are expressed and compete for primacy in public. This paper highlights the role that the basic mechanisms of social information processing play in massive computer-mediated interactions on opinions

    Cooperation and Reputation Dynamics with Reinforcement Learning

    Creating incentives for cooperation is a challenge in natural and artificial systems. One potential answer is reputation, whereby agents trade the immediate cost of cooperation for the future benefits of having a good reputation. Game theoretical models have shown that specific social norms can make cooperation stable, but how agents can independently learn to establish effective reputation mechanisms on their own is less understood. We use a simple model of reinforcement learning to show that reputation mechanisms generate two coordination problems: agents need to learn how to coordinate on the meaning of existing reputations and collectively agree on a social norm to assign reputations to others based on their behavior. These coordination problems exhibit multiple equilibria, some of which effectively establish cooperation. When we train agents with a standard Q-learning algorithm in an environment with the presence of reputation mechanisms, convergence to undesirable equilibria is widespread. We propose two mechanisms to alleviate this: (i) seeding a proportion of the system with fixed agents that steer others towards good equilibria; and (ii), intrinsic rewards based on the idea of introspection, i.e., augmenting agents' rewards by an amount proportionate to the performance of their own strategy against themselves. A combination of these simple mechanisms is successful in stabilizing cooperation, even in a fully decentralized version of the problem where agents learn to use and assign reputations simultaneously. We show how our results relate to the literature in Evolutionary Game Theory, and discuss implications for artificial, human and hybrid systems, where reputations can be used as a way to establish trust and cooperation.Comment: Published in AAMAS'21, 9 page

    Dynamics of Boltzmann Q-Learning in Two-Player Two-Action Games

    We consider the dynamics of Q-learning in two-player two-action games with a Boltzmann exploration mechanism. For any non-zero exploration rate the dynamics is dissipative, which guarantees that agent strategies converge to rest points that are generally different from the game's Nash Equlibria (NE). We provide a comprehensive characterization of the rest point structure for different games, and examine the sensitivity of this structure with respect to the noise due to exploration. Our results indicate that for a class of games with multiple NE the asymptotic behavior of learning dynamics can undergo drastic changes at critical exploration rates. Furthermore, we demonstrate that for certain games with a single NE, it is possible to have additional rest points (not corresponding to any NE) that persist for a finite range of the exploration rates and disappear when the exploration rates of both players tend to zero.Comment: 10 pages, 12 figures. Version 2: added more extensive discussion of asymmetric equilibria; clarified conditions for continuous/discontinuous bifurcations in coordination/anti-coordination game

    Continuous Strategy Replicator Dynamics for Multi--Agent Learning

    The problem of multi-agent learning and adaptation has attracted a great deal of attention in recent years. It has been suggested that the dynamics of multi agent learning can be studied using replicator equations from population biology. Most existing studies so far have been limited to discrete strategy spaces with a small number of available actions. In many cases, however, the choices available to agents are better characterized by continuous spectra. This paper suggests a generalization of the replicator framework that allows to study the adaptive dynamics of Q-learning agents with continuous strategy spaces. Instead of probability vectors, agents strategies are now characterized by probability measures over continuous variables. As a result, the ordinary differential equations for the discrete case are replaced by a system of coupled integral--differential replicator equations that describe the mutual evolution of individual agent strategies. We derive a set of functional equations describing the steady state of the replicator dynamics, examine their solutions for several two-player games, and confirm our analytical results using simulations.Comment: 12 pages, 15 figures, accepted for publication in JAAMA

    Pigouvian algorithmic platform design

    There are rising concerns that reinforcement algorithms might learn tacit collusion in oligopolistic pricing, and moreover that the resulting ‘black box’ strategies would be difficult to regulate. Here, I exploit a strong connection between evolutionary game theory and reinforcement learning to show when the latter’s rest points are Bayes–Nash equilibria, but also to derive a system of Pigouvian taxes guaranteed to implement an (unknown) socially optimal outcome of an oligopoly pricing game. Finally, I illustrate reinforcement learning of equilibrium play via simulation, which provides evidence of the capacity of reinforcement algorithms to collude in a very simple setting, but the introduction of the optimal tax scheme induces a competitive outcome

    Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

    The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in deep reinforcement learning to explore multi-agent systems in complex environments and use these benchmarks to advance our understanding. Here, we summarise the recent work of our team and present a taxonomy that we feel highlights many important open challenges in multi-agent research.Comment: Published in AI Communications 202

    The Stabilisation of Equilibria in Evolutionary Game Dynamics through Mutation: Mutation Limits in Evolutionary Games

    The multi-population replicator dynamics is a dynamic approach to coevolving populations and multi-player games and is related to Cross learning. In general, not every equilibrium is a Nash equilibrium of the underlying game, and the convergence is not guaranteed. In particular, no interior equilibrium can be asymptotically stable in the multi-population replicator dynamics, e.g. resulting in cyclic orbits around a single interior Nash equilibrium. We introduce a new notion of equilibria of replicator dynamics, called mutation limits, based on a naturally arising, simple form of mutation, which is invariant under the specific choice of mutation parameters. We prove the existence of mutation limits for a large class of games, and consider a particularly interesting subclass called attracting mutation limits. Attracting mutation limits are approximated in every (mutation-)perturbed replicator dynamics, hence they offer an approximate dynamic solution to the underlying game even if the original dynamic is not convergent. Thus, mutation stabilizes the system in certain cases and makes attracting mutation limits near attainable. Hence, attracting mutation limits are relevant as a dynamic solution concept of games. We observe that they have some similarity to Q-learning in multi-agent reinforcement learning. Attracting mutation limits do not exist in all games, however, raising the question of their characterization

    Bounds and dynamics for empirical game theoretic analysis

    This paper provides several theoretical results for empirical game theory. Specifically, we introduce bounds for empirical game theoretical analysis of complex multi-agent interactions. In doing so we provide insights in the empirical meta game showing that a Nash equilibrium of the estimated meta-game is an approximate Nash equilibrium of the true underlying meta-game. We investigate and show how many data samples are required to obtain a close enough approximation of the underlying game. Additionally, we extend the evolutionary dynamics analysis of meta-games using heuristic payoff tables (HPTs) to asymmetric games. The state-of-the-art has only considered evolutionary dynamics of symmetric HPTs in which agents have access to the same strategy sets and the payoff structure is symmetric, implying that agents are interchangeable. Finally, we carry out an empirical illustration of the generalised method in several domains, illustrating the theory and evolutionary dynamics of several versions of the AlphaGo algorithm (symmetric), the dynamics of the Colonel Blotto game played by human players on Facebook (symmetric), the dynamics of several teams of players in the capture the flag game (symmetric), and an example of a meta-game in Leduc Poker (asymmetric), generated by the policy-space response oracle multi-agent learning algorithm