7 research outputs found
Approximate Equilibrium Computation for Discrete-Time Linear-Quadratic Mean-Field Games
While the topic of mean-field games (MFGs) has a relatively long history,
heretofore there has been limited work concerning algorithms for the
computation of equilibrium control policies. In this paper, we develop a
computable policy iteration algorithm for approximating the mean-field
equilibrium in linear-quadratic MFGs with discounted cost. Given the
mean-field, each agent faces a linear-quadratic tracking problem, the solution
of which involves a dynamical system evolving in retrograde time. This makes
the development of forward-in-time algorithm updates challenging. By
identifying a structural property of the mean-field update operator, namely
that it preserves sequences of a particular form, we develop a forward-in-time
equilibrium computation algorithm. Bounds that quantify the accuracy of the
computed mean-field equilibrium as a function of the algorithm's stopping
condition are provided. The optimality of the computed equilibrium is validated
numerically. In contrast to the most recent/concurrent results, our algorithm
appears to be the first to study infinite-horizon MFGs with non-stationary
mean-field equilibria, though with focus on the linear quadratic setting.Comment: This paper has been accepted in ACC 202
Online Planning for Decentralized Stochastic Control with Partial History Sharing
In decentralized stochastic control, standard approaches for sequential
decision-making, e.g. dynamic programming, quickly become intractable due to
the need to maintain a complex information state. Computational challenges are
further compounded if agents do not possess complete model knowledge. In this
paper, we take advantage of the fact that in many problems agents share some
common information, or history, termed partial history sharing. Under this
information structure the policy search space is greatly reduced. We propose a
provably convergent, online tree-search based algorithm that does not require a
closed-form model or explicit communication among agents. Interestingly, our
algorithm can be viewed as a generalization of several existing heuristic
solvers for decentralized partially observable Markov decision processes. To
demonstrate the applicability of the model, we propose a novel collaborative
intrusion response model, where multiple agents (defenders) possessing
asymmetric information aim to collaboratively defend a computer network.
Numerical results demonstrate the performance of our algorithm.Comment: Accepted to American Control Conference (ACC) 201
Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic Mean-Field Games
In this paper, we study large population multi-agent reinforcement learning
(RL) in the context of discrete-time linear-quadratic mean-field games
(LQ-MFGs). Our setting differs from most existing work on RL for MFGs, in that
we consider a non-stationary MFG over an infinite horizon. We propose an
actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE)
of the LQ-MFG. There are two primary challenges: i) the non-stationarity of the
MFG induces a linear-quadratic tracking problem, which requires solving a
backwards-in-time (non-causal) equation that cannot be solved by standard
(causal) RL algorithms; ii) Many RL algorithms assume that the states are
sampled from the stationary distribution of a Markov chain (MC), that is, the
chain is already mixed, an assumption that is not satisfied for real data
sources. We first identify that the mean-field trajectory follows linear
dynamics, allowing the problem to be reformulated as a linear quadratic
Gaussian problem. Under this reformulation, we propose an actor-critic
algorithm that allows samples to be drawn from an unmixed MC. Finite-sample
convergence guarantees for the algorithm are then provided. To characterize the
performance of our algorithm in multi-agent RL, we have developed an error
bound with respect to the Nash equilibrium of the finite-population game.Comment: To appear in CDC 202
Convergent Policy Optimization for Safe Reinforcement Learning
We study the safe reinforcement learning problem with nonlinear function
approximation, where policy optimization is formulated as a constrained
optimization problem with both the objective and the constraint being nonconvex
functions. For such a problem, we construct a sequence of surrogate convex
constrained optimization problems by replacing the nonconvex functions locally
with convex quadratic functions obtained from policy gradient estimators. We
prove that the solutions to these surrogate problems converge to a stationary
point of the original nonconvex problem. Furthermore, to extend our theoretical
results, we apply our algorithm to examples of optimal control and multi-agent
reinforcement learning with safety constraints
Optimization for Reinforcement Learning: From Single Agent to Cooperative Agents
This article reviews recent advances in multi-agent reinforcement learning
algorithms for large-scale control systems and communication networks, which
learn to communicate and cooperate. We provide an overview of this emerging
field, with an emphasis on the decentralized setting under different
coordination protocols. We highlight the evolution of reinforcement learning
algorithms from single-agent to multi-agent systems, from a distributed
optimization perspective, and conclude with future directions and challenges,
in the hope to catalyze the growing synergy among distributed optimization,
signal processing, and reinforcement learning communities
Feature-Based Q-Learning for Two-Player Stochastic Games
Consider a two-player zero-sum stochastic game where the transition function
can be embedded in a given feature space. We propose a two-player Q-learning
algorithm for approximating the Nash equilibrium strategy via sampling. The
algorithm is shown to find an -optimal strategy using sample size
linear to the number of features. To further improve its sample efficiency, we
develop an accelerated algorithm by adopting techniques such as variance
reduction, monotonicity preservation and two-sided strategy approximation. We
prove that the algorithm is guaranteed to find an -optimal strategy
using no more than
samples with high probability, where is the number of features and
is a discount factor. The sample, time and space complexities of the algorithm
are independent of original dimensions of the game.Comment: 23 page
Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms
Recent years have witnessed significant advances in reinforcement learning
(RL), which has registered great success in solving various sequential
decision-making problems in machine learning. Most of the successful RL
applications, e.g., the games of Go and Poker, robotics, and autonomous
driving, involve the participation of more than one single agent, which
naturally fall into the realm of multi-agent RL (MARL), a domain with a
relatively long history, and has recently re-emerged due to advances in
single-agent RL techniques. Though empirically successful, theoretical
foundations for MARL are relatively lacking in the literature. In this chapter,
we provide a selective overview of MARL, with focus on algorithms backed by
theoretical analysis. More specifically, we review the theoretical results of
MARL algorithms mainly within two representative frameworks, Markov/stochastic
games and extensive-form games, in accordance with the types of tasks they
address, i.e., fully cooperative, fully competitive, and a mix of the two. We
also introduce several significant but challenging applications of these
algorithms. Orthogonal to the existing reviews on MARL, we highlight several
new angles and taxonomies of MARL theory, including learning in extensive-form
games, decentralized MARL with networked agents, MARL in the mean-field regime,
(non-)convergence of policy-based methods for learning in games, etc. Some of
the new angles extrapolate from our own research endeavors and interests. Our
overall goal with this chapter is, beyond providing an assessment of the
current state of the field on the mark, to identify fruitful future research
directions on theoretical studies of MARL. We expect this chapter to serve as
continuing stimulus for researchers interested in working on this exciting
while challenging topic.Comment: Invited Chapter in Handbook on RL and Control (Springer Studies in
Systems, Decision and Control