Search CORE

14 research outputs found

Evolutionary Game Theory Squared: Evolving Agents in Endogenously Evolving Zero-Sum Games

Author: Fiez Tanner
Piliouras Georgios
Ratliff Lillian
Sim Ryann
Skoulakis Stratis
Publication venue
Publication date: 15/12/2020
Field of study

The predominant paradigm in evolutionary game theory and more generally online learning in games is based on a clear distinction between a population of dynamic agents that interact given a fixed, static game. In this paper, we move away from the artificial divide between dynamic agents and static games, to introduce and analyze a large class of competitive settings where both the agents and the games they play evolve strategically over time. We focus on arguably the most archetypal game-theoretic setting -- zero-sum games (as well as network generalizations) -- and the most studied evolutionary learning dynamic -- replicator, the continuous-time analogue of multiplicative weights. Populations of agents compete against each other in a zero-sum competition that itself evolves adversarially to the current population mixture. Remarkably, despite the chaotic coevolution of agents and games, we prove that the system exhibits a number of regularities. First, the system has conservation laws of an information-theoretic flavor that couple the behavior of all agents and games. Secondly, the system is Poincar\'{e} recurrent, with effectively all possible initializations of agents and games lying on recurrent orbits that come arbitrarily close to their initial conditions infinitely often. Thirdly, the time-average agent behavior and utility converge to the Nash equilibrium values of the time-average game. Finally, we provide a polynomial time algorithm to efficiently predict this time-average behavior for any such coevolving network game.Comment: To appear in AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Learning and Decision-Making in Competitive and Uncertain Systems

Author: Fiez Tanner
Publication venue
Publication date: 01/01/2021
Field of study

Thesis (Ph.D.)--University of Washington, 2021As a result of the demonstrated potential for impact in traditional use cases, progressively more is being asked of machine learning methods. This evolution has lead to a renewed focus on learning and decision-making systems. In this domain, theoretical challenges relating to competition and uncertainty are emerging from the practical considerations that have motivated this paradigm shift. There is an increasing awareness that learning and decision-making algorithms will eventually need to be or already are being embedded into complex systems where game-theoretic considerations naturally arise owing to the presence of competing, self-interested entities. Moreover, it has become clear that the artificial introduction of competition in game-theoretic abstractions of machine learning problems can often be a convenient and effective modeling technique for many problems of interest. Consequently, tools from game theory are now critically needed to analyze coupled learning and decision-making algorithms for the purposes of characterizing the outcomes that can be expected from competitive interactions and computing meaningful solutions such as equilibria in machine learning problems. Meanwhile, the demands of learning and decision-making algorithms operating under uncertainty are both changing and becoming more challenging. This transformation includes a movement towards more general, yet structured feedback models and objectives that reflect the desire to enable downstream tasks and future inferences. To this end, important problems remain to be solved pertaining to designing theoretically sound sequential decision-making algorithms tailored to such tasks. This discussion motivates the research on learning and decision-making in competitive and uncertain systems presented in this thesis. Together, the contents of this thesis can be summarized by a pair of themes that form Parts I and II: game-theoretic methods for analyzing decision-making algorithms and solving machine learning problems, and machine learning methods for designing and analyzing sequential decision-making algorithms under uncertainty. The former theme is approached from a top-down perspective: general formulations of games and gradient-based learning algorithms are studied, theoretical characterizations are developed, and then the results are connected to specific problems of interest. In contrast, the latter theme is approached from a bottom-up perspective: models of practical sequential decision-making tasks are developed and then theoretically justified algorithms and solutions are constructed. While learning and optimization in games is a well-studied topic, the majority of past research has focused on highly structured settings. Part I of this thesis moves away from this practice and presents studies of nonconvex games on continuous strategy spaces and gradient-based learning algorithms within them. The intent of this research is to develop appropriate notions of game-theoretic equilibria, characterize and understand the behaviors of so-called `natural' learning dynamics, and establish methods for computing equilibria to solve machine learning problems formulated as games. Chapter 2 lays the foundation for Part I and is built upon thereafter. Based upon the idea of viewing the underlying interaction structure as a Stackelberg game, both a local Stackelberg equilibrium concept and a corresponding characterization in terms of gradient-based sufficient conditions called a differential Stackelberg equilibrium are presented. Learning dynamics emulating the natural game structure are then constructed and convergence guarantees to differential Stackelberg equilibrium are proven. Chapter 3 follows along this path to study the role of timescale separation on the convergence of the canonical gradient descent-ascent learning dynamics in the subclass of nonconvex-nonconcave zero-sum games. The results characterize the timescales for which the dynamics both locally converge to differential Stackelberg equilibrium and locally avoid points lacking game-theoretic meaning. Finally, Chapter 4 considers zero-sum games in which the minimizing player faces a nonconvex objective and the maximizing player optimizes a Polyak-Lojasiewicz or strongly-concave objective. For this class of games, global convergence guarantees for gradient descent-ascent with timescale separation to only differential Stackelberg equilibrium are proven. Throughout Part I, the implications of the theoretical results for both competitive decision-making and methods for solving machine learning problems are discussed. Traditionally, the study of sequential decision-making under uncertainty in machine learning has focused on problems in which the evaluation criterion is directly linked to the immediate feedback. However, it has become clear that decision-making under uncertainty is often also pertinent to problems where the goal of the learner is instead to acquire information for the purpose of drawing inferences or fulfilling targets only partially linked to the immediate feedback. Part II of this thesis presents a pair of studies on well-motivated sequential decision-making problems with structured feedback models that fall under this theme. The intent of this research is to design sequential decision-making algorithms for solving practical problems that emerge in the real-world with desirable theoretical guarantees by exploiting structured feedback models. Chapter 5 commences Part II by formulating the task of ranking papers to reviewers in peer review bidding systems as a sequential decision-making problem. A model of this problem is developed that identifies a pair of misaligned objectives: ensuring that each paper obtains a sufficient number of bids to be matched adequately with qualified reviewers, and respecting the preferences of reviewers by showing them relevant papers early in the list. To balance the competing objectives, a sequential decision-making algorithm is constructed that exploits the objective structure and it is shown both theoretically and empirically to have a number of advantages over baselines currently used in practice.Chapter 6 then concludes Part II with an analysis of pure exploration transductive linear bandits, a problem that arises naturally in experimental design settings. A decision-maker in this problem sequentially samples measurement vectors from a given set and observes a noisy linear response with an unknown parameter vector. The goal is to infer with high confidence the item from a separate set of vectors that has the maximum inner product with the unknown parameter vector while taking a minimal number of measurements. The optimal achievable sample complexity for this problem is characterized and a near-optimal algorithm that exploits the information structure of the feedback model to enhance the sample efficiency is developed. Together, the contributions of this thesis take steps towards developing important theoretical foundations for learning and decision-making with competition and uncertainty

DSpace at The University of Washington