Search CORE

265 research outputs found

Distributed accelerated Nash equilibrium learning for two-subnetwork zero-sum game with bilinear coupling

Author: Cui Jinqiang
Dou Lihua
Zeng Xianlin
Publication venue: Institute of Information Theory and Automation AS CR
Publication date: 01/01/2023
Field of study

summary:This paper proposes a distributed accelerated first-order continuous-time algorithm for

O({1}/{t^2})

convergence to Nash equilibria in a class of two-subnetwork zero-sum games with bilinear couplings. First-order methods, which only use subgradients of functions, are frequently used in distributed/parallel algorithms for solving large-scale and big-data problems due to their simple structures. However, in the worst cases, first-order methods for two-subnetwork zero-sum games often have an asymptotic or

O(1/t)

convergence. In contrast to existing time-invariant first-order methods, this paper designs a distributed accelerated algorithm by combining saddle-point dynamics and time-varying derivative feedback techniques. If the parameters of the proposed algorithm are suitable, the algorithm owns

O(1/t^2)

convergence in terms of the duality gap function without any uniform or strong convexity requirement. Numerical simulations show the efficacy of the algorithm

Institute of Mathematics AS CR, v. v. i.

Many-agent Reinforcement Learning

Author: Yang Yaodong
Publication venue: UCL (University College London)
Publication date: 28/03/2021
Field of study

Multi-agent reinforcement learning (RL) solves the problem of how each agent should behave optimally in a stochastic environment in which multiple agents are learning simultaneously. It is an interdisciplinary domain with a long history that lies in the joint area of psychology, control theory, game theory, reinforcement learning, and deep learning. Following the remarkable success of the AlphaGO series in single-agent RL, 2019 was a booming year that witnessed significant advances in multi-agent RL techniques; impressive breakthroughs have been made on developing AIs that outperform humans on many challenging tasks, especially multi-player video games. Nonetheless, one of the key challenges of multi-agent RL techniques is the scalability; it is still non-trivial to design efficient learning algorithms that can solve tasks including far more than two agents (

N \gg 2

), which I name by \emph{many-agent reinforcement learning} (MARL\footnote{I use the world of ``MARL" to denote multi-agent reinforcement learning with a particular focus on the cases of many agents; otherwise, it is denoted as ``Multi-Agent RL" by default.}) problems. In this thesis, I contribute to tackling MARL problems from four aspects. Firstly, I offer a self-contained overview of multi-agent RL techniques from a game-theoretical perspective. This overview fills the research gap that most of the existing work either fails to cover the recent advances since 2010 or does not pay adequate attention to game theory, which I believe is the cornerstone to solving many-agent learning problems. Secondly, I develop a tractable policy evaluation algorithm --

\alpha^\alpha

-Rank -- in many-agent systems. The critical advantage of

\alpha^\alpha

-Rank is that it can compute the solution concept of

\alpha

-Rank tractably in multi-player general-sum games with no need to store the entire pay-off matrix. This is in contrast to classic solution concepts such as Nash equilibrium which is known to be

PPAD

-hard in even two-player cases.

\alpha^\alpha

-Rank allows us, for the first time, to practically conduct large-scale multi-agent evaluations. Thirdly, I introduce a scalable policy learning algorithm -- mean-field MARL -- in many-agent systems. The mean-field MARL method takes advantage of the mean-field approximation from physics, and it is the first provably convergent algorithm that tries to break the curse of dimensionality for MARL tasks. With the proposed algorithm, I report the first result of solving the Ising model and multi-agent battle games through a MARL approach. Fourthly, I investigate the many-agent learning problem in open-ended meta-games (i.e., the game of a game in the policy space). Specifically, I focus on modelling the behavioural diversity in meta-games, and developing algorithms that guarantee to enlarge diversity during training. The proposed metric based on determinantal point processes serves as the first mathematically rigorous definition for diversity. Importantly, the diversity-aware learning algorithms beat the existing state-of-the-art game solvers in terms of exploitability by a large margin. On top of the algorithmic developments, I also contribute two real-world applications of MARL techniques. Specifically, I demonstrate the great potential of applying MARL to study the emergent population dynamics in nature, and model diverse and realistic interactions in autonomous driving. Both applications embody the prospect that MARL techniques could achieve huge impacts in the real physical world, outside of purely video games

UCL Discovery

Recommended from our members

Algorithms for Optimal Paths of One, Many, and an Infinite Number of Agents

Author: Lin Alex Tong
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

In this dissertation, we provide efficient algorithms for modeling the behavior of a single agent, multiple agents, and a continuum of agents. For a single agent, we combine the modeling framework of optimal control with advances in optimization splitting in order to efficiently find optimal paths for problems in very high-dimensions, thus providing alleviation from the curse of dimensionality. For a multiple, but finite, number of agents, we take the framework of multi-agent reinforcement learning and utilize imitation learning in order to decentralize a centralized expert, thus obtaining optimal multi-agents that act in a decentralized fashion. For a continuum of agents, we take the framework of mean-field games and use two neural networks, which we train in an alternating scheme, in order to efficiently find optimal paths for high-dimensional and stochastic problems. These tools cover a wide variety of use-cases that can be immediately deployed for practical applications

eScholarship - University of California

Recommended from our members

On Building Generalizable Learning Agents

Author: Wu Yi
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

It has been a long-standing goal in Artificial Intelligence (AI) to build machines that can solve tasks that humans can. Thanks to the recent rapid progress in data-driven methods, which train agents to solve tasks by learning from massive training data, there have been many successes in applying such learning approaches to handle and even solve a number of extremely challenging tasks, including image classification, language generation, robotics control, and several multi-player games. The key factor for all these data-driven successes is that the trained agents can generalize to test scenarios that are unseen during training. This generalization capability is the foundation for building any practical AI system. This thesis studies generalization, the fundamental challenge in AI, and proposes solutions to improve the generalization performances of learning agents in a variety of problems. We start by providing a formal formulation of the generalization problem in the context of reinforcement learning and proposing 4 principles within this formulation to guide the design of training techniques for improved generalization. We validate the effectiveness of our proposed principles by considering 4 different domains, from simple to complex, and developing domain-specific techniques following these principles. Particularly, we begin with the simplest domain, i.e., path-finding on graphs (Part I), and then consider visual navigation in a 3D world (Part II) and competition in complex multi-agent games (Part III), and lastly tackle some natural language processing tasks (Part IV). Empirical evidences demonstrate that the proposed principles can generally lead to much improved generalization performances in a wide range of problems

eScholarship - University of California

Searching for joint gains in automated negotiations based on multi-criteria decision making theory

Author: Padgham L
Vo B
Publication venue: Research Publishing Services (New York, USA)
Publication date: 01/01/2007
Field of study

It is well established by conflict theorists and others that successful negotiation should incorporate "creating value" as well as "claiming value." Joint improvements that bring benefits to all parties can be realised by (i) identifying attributes that are not of direct conflict between the parties, (ii) tradeoffs on attributes that are valued differently by different parties, and (iii) searching for values within attributes that could bring more gains to one party while not incurring too much loss on the other party. In this paper we propose an approach for maximising joint gains in automated negotiations by formulating the negotiation problem as a multi-criteria decision making problem and taking advantage of several optimisation techniques introduced by operations researchers and conflict theorists. We use a mediator to protect the negotiating parties from unnecessary disclosure of information to their opponent, while also allowing an objective calculation of maximum joint gains. We separate out attributes that take a finite set of values (simple attributes) from those with continuous values, and we show that for simple attributes, the mediator can determine the Pareto-optimal values. In addition we show that if none of the simple attributes strongly dominates the other simple attributes, then truth telling is an equilibrium strategy for negotiators during the optimisation of simple attributes. We also describe an approach for improving joint gains on non-simple attributes, by moving the parties in a series of steps, towards the Pareto-optimal frontier

RMIT Research Repository

Spatial competition of learning agents in agricultural procurement markets

Author: Khalili Hamed
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Spatially dispersed farmers supply raw milk as the primary input to a small number of large dairy-processing firms. The spatial competition of processing firms has short- to long-term repercussions on farm and processor structure, as it determines the regional demand for raw milk and the resulting raw milk price. A number of recent analytical and empirical contributions in the literature analyse the spatial price competition of processing firms in milk markets. Agent-based models (ABMs) serve by now as computational laboratories in many social science and interdisciplinary fields and are recently also introduced as bottom-up approaches to help understand market outcomes emerging from autonomously deciding and interacting agents. Despite ABMs' strengths, the inclusion of interactive learning by intelligent agents is not sufficiently matured. Although the literature of multi-agent systems (MASs) and multi-agent economic simulation are related fields of research they have progressed along separate paths. This thesis takes us through some basic steps involved in developing a theoretical basis for designing multi-agent learning in spatial economic ABMs. Each of the three main chapters of the thesis investigates a core issue for designing interactive learning systems with the overarching aim of better understanding the emergence of pricing behaviour in real, spatial agricultural markets. An important problem in the competitive spatial economics literature is the lack of a rigorous theoretical explanation for observed collusive behavior in oligopsonistic markets. The first main chapter theoretically derives how the incorporation of foresight in agents' pricing policy in spatial markets might move the system towards cooperative Nash equilibria. It is shown that a basic level of foresight invites competing firms to cease limitless price wars. Introducing the concept of an outside option into the agents' decisions within a dynamic pricing game reveals viihow decreasing returns for increasing strategic thinking correlates with the relevance of transportation costs. In the second main chapter, we introduce a new learning algorithm for rational agents using H-PHC (hierarchical policy hill climbing) in spatial markets. While MASs algorithms are typically just applicable to small problems, we show experimentally how a community of multiple rational agents is able to overcome the coordination problem in a variety of spatial (and non-spatial) market games of rich decision spaces with modest computational effort. The theoretical explanation of emerging price equilibria in spatial markets is much disputed in the literature. The majority of papers attribute the pricing behavior of processing firms (mill price and freight absorption) merely to the spatial structure of markets. Based on a computational approach with interactive learning agents in two-dimensional space, the third main chapter suggests that associating the extent of freight absorption just with the factor space can be ambiguous. In addition, the pricing behavior of agricultural processors – namely the ability to coordinate and achieve mutually beneficial outcomes - also depends on their ability to learn from each other

bonndoc – Der Publikationsserver der Universität Bonn