62,725 research outputs found
Mean Field Multi-Agent Reinforcement Learning
Existing multi-agent reinforcement learning methods are limited typically to a small number of agents. When the agent number increases largely, the learning becomes intractable due to the curse of the dimensionality and the exponential growth of agent interactions. In this paper, we present Mean Field Reinforcement Learning where the interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents; the interplay between the two entities is mutually reinforced: the learning of the individual agent’s optimal policy depends on the dynamics of the population, while the dynamics of the population change according to the collective patterns of the individual policies. We develop practical mean field Q-learning and mean field Actor-Critic algorithms and analyze the convergence of the solution to Nash equilibrium. Experiments on Gaussian squeeze, Ising model, and battle games justify the learning effectiveness of our mean field approaches. In addition, we report the first result to solve the Ising model via model-free reinforcement learning methods
Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning
Many applications, e.g., in shared mobility, require coordinating a large
number of agents. Mean-field reinforcement learning addresses the resulting
scalability challenge by optimizing the policy of a representative agent. In
this paper, we address an important generalization where there exist global
constraints on the distribution of agents (e.g., requiring capacity constraints
or minimum coverage requirements to be met). We propose Safe--UCRL,
the first model-based algorithm that attains safe policies even in the case of
unknown transition dynamics. As a key ingredient, it uses epistemic uncertainty
in the transition model within a log-barrier approach to ensure pessimistic
constraints satisfaction with high probability. We showcase
Safe--UCRL on the vehicle repositioning problem faced by many
shared mobility operators and evaluate its performance through simulations
built on Shenzhen taxi trajectory data. Our algorithm effectively meets the
demand in critical areas while ensuring service accessibility in regions with
low demand.Comment: 25 pages, 14 figures, 3 table
Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning
A fundamental question in any peer-to-peer ridesharing system is how to, both
effectively and efficiently, dispatch user's ride requests to the right driver
in real time. Traditional rule-based solutions usually work on a simplified
problem setting, which requires a sophisticated hand-crafted weight design for
either centralized authority control or decentralized multi-agent scheduling
systems. Although recent approaches have used reinforcement learning to provide
centralized combinatorial optimization algorithms with informative weight
values, their single-agent setting can hardly model the complex interactions
between drivers and orders. In this paper, we address the order dispatching
problem using multi-agent reinforcement learning (MARL), which follows the
distributed nature of the peer-to-peer ridesharing problem and possesses the
ability to capture the stochastic demand-supply dynamics in large-scale
ridesharing scenarios. Being more reliable than centralized approaches, our
proposed MARL solutions could also support fully distributed execution through
recent advances in the Internet of Vehicles (IoV) and the Vehicle-to-Network
(V2N). Furthermore, we adopt the mean field approximation to simplify the local
interactions by taking an average action among neighborhoods. The mean field
approximation is capable of globally capturing dynamic demand-supply variations
by propagating many local interactions between agents and the environment. Our
extensive experiments have shown the significant improvements of MARL order
dispatching algorithms over several strong baselines on the gross merchandise
volume (GMV), and order response rate measures. Besides, the simulated
experiments with real data have also justified that our solution can alleviate
the supply-demand gap during the rush hours, thus possessing the capability of
reducing traffic congestion.Comment: 11 pages, 9 figure
Multi Type Mean Field Reinforcement Learning
Mean field theory provides an effective way of scaling multiagent
reinforcement learning algorithms to environments with many agents that can be
abstracted by a virtual mean agent. In this paper, we extend mean field
multiagent algorithms to multiple types. The types enable the relaxation of a
core assumption in mean field games, which is that all agents in the
environment are playing almost similar strategies and have the same goal. We
conduct experiments on three different testbeds for the field of many agent
reinforcement learning, based on the standard MAgents framework. We consider
two different kinds of mean field games: a) Games where agents belong to
predefined types that are known a priori and b) Games where the type of each
agent is unknown and therefore must be learned based on observations. We
introduce new algorithms for each type of game and demonstrate their superior
performance over state of the art algorithms that assume that all agents belong
to the same type and other baseline algorithms in the MAgent framework.Comment: Paper to appear in the Proceedings of International Conference on
Autonomous Agents and Multi-Agent Systems (AAMAS) 2020. Revised version has
some typos correcte
Many-agent Reinforcement Learning
Multi-agent reinforcement learning (RL) solves the problem of how each agent should behave optimally in a stochastic environment in which multiple agents are learning simultaneously. It is an interdisciplinary domain with a long history that lies in the joint area of psychology, control theory, game theory, reinforcement learning, and deep learning. Following the remarkable success of the AlphaGO series in single-agent RL, 2019 was a booming year that witnessed significant advances in multi-agent RL techniques; impressive breakthroughs have been made on developing AIs that outperform humans on many challenging tasks, especially multi-player video games. Nonetheless, one of the key challenges of multi-agent RL techniques is the scalability; it is still non-trivial to design efficient learning algorithms that can solve tasks including far more than two agents (), which I name by \emph{many-agent reinforcement learning} (MARL\footnote{I use the world of ``MARL" to denote multi-agent reinforcement learning with a particular focus on the cases of many agents; otherwise, it is denoted as ``Multi-Agent RL" by default.}) problems. In this thesis, I contribute to tackling MARL problems from four aspects. Firstly, I offer a self-contained overview of multi-agent RL techniques from a game-theoretical perspective. This overview fills the research gap that most of the existing work either fails to cover the recent advances since 2010 or does not pay adequate attention to game theory, which I believe is the cornerstone to solving many-agent learning problems. Secondly, I develop a tractable policy evaluation algorithm -- -Rank -- in many-agent systems. The critical advantage of -Rank is that it can compute the solution concept of -Rank tractably in multi-player general-sum games with no need to store the entire pay-off matrix. This is in contrast to classic solution concepts such as Nash equilibrium which is known to be -hard in even two-player cases. -Rank allows us, for the first time, to practically conduct large-scale multi-agent evaluations. Thirdly, I introduce a scalable policy learning algorithm -- mean-field MARL -- in many-agent systems. The mean-field MARL method takes advantage of the mean-field approximation from physics, and it is the first provably convergent algorithm that tries to break the curse of dimensionality for MARL tasks. With the proposed algorithm, I report the first result of solving the Ising model and multi-agent battle games through a MARL approach. Fourthly, I investigate the many-agent learning problem in open-ended meta-games (i.e., the game of a game in the policy space). Specifically, I focus on modelling the behavioural diversity in meta-games, and developing algorithms that guarantee to enlarge diversity during training. The proposed metric based on determinantal point processes serves as the first mathematically rigorous definition for diversity. Importantly, the diversity-aware learning algorithms beat the existing state-of-the-art game solvers in terms of exploitability by a large margin. On top of the algorithmic developments, I also contribute two real-world applications of MARL techniques. Specifically, I demonstrate the great potential of applying MARL to study the emergent population dynamics in nature, and model diverse and realistic interactions in autonomous driving. Both applications embody the prospect that MARL techniques could achieve huge impacts in the real physical world, outside of purely video games
Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path
We consider online reinforcement learning in Mean-Field Games (MFGs). Unlike
traditional approaches, we alleviate the need for a mean-field oracle by
developing an algorithm that approximates the Mean-Field Equilibrium (MFE)
using the single sample path of the generic agent. We call this {\it Sandbox
Learning}, as it can be used as a warm-start for any agent learning in a
multi-agent non-cooperative setting. We adopt a two time-scale approach in
which an online fixed-point recursion for the mean-field operates on a slower
time-scale, in tandem with a control policy update on a faster time-scale for
the generic agent. Given that the underlying Markov Decision Process (MDP) of
the agent is communicating, we provide finite sample convergence guarantees in
terms of convergence of the mean-field and control policy to the mean-field
equilibrium. The sample complexity of the Sandbox learning algorithm is
where is the MFE approximation
error. This is similar to works which assume access to oracle. Finally, we
empirically demonstrate the effectiveness of the sandbox learning algorithm in
diverse scenarios, including those where the MDP does not necessarily have a
single communicating class.Comment: Accepted for publication in AISTATS 202
Regularization of the policy updates for stabilizing Mean Field Games
This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL)
where multiple agents interact in the same environment and whose goal is to
maximize the individual returns. Challenges arise when scaling up the number of
agents due to the resultant non-stationarity that the many agents introduce. In
order to address this issue, Mean Field Games (MFG) rely on the symmetry and
homogeneity assumptions to approximate games with very large populations.
Recently, deep Reinforcement Learning has been used to scale MFG to games with
larger number of states. Current methods rely on smoothing techniques such as
averaging the q-values or the updates on the mean-field distribution. This work
presents a different approach to stabilize the learning based on proximal
updates on the mean-field policy. We name our algorithm Mean Field Proximal
Policy Optimization (MF-PPO), and we empirically show the effectiveness of our
method in the OpenSpiel framework
Multi-Agent Reinforcement Learning in Large Complex Environments
Multi-agent reinforcement learning (MARL) has seen much success in the past decade. However, these methods are yet to find wide application in large-scale real world problems due to two important reasons. First, MARL algorithms have poor sample efficiency, where many data samples need to be obtained through interactions with the environment to learn meaningful policies, even in small environments. Second, MARL algorithms are not scalable to environments with many agents since, typically, these algorithms are exponential in the number of agents in the environment. This dissertation aims to address both of these challenges with the goal of making MARL applicable to a variety of real world environments.
Towards improving sample efficiency, an important observation is that many real world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. A useful possibility that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this dissertation, we provide a principled framework for incorporating action recommendations from online sub-optimal advisors in multi-agent settings. To this end, we propose a general model for learning from external advisors in MARL and show that desirable theoretical properties such as convergence to a unique solution concept, and reasonable finite sample complexity bounds exist, under a set of common assumptions. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors.
Towards scaling MARL, we explore the use of mean field theory. Mean field theory provides an effective way of scaling multi-agent reinforcement learning algorithms to environments with many agents, where other agents can be abstracted by a virtual mean agent. Prior work has used mean field theory in MARL, however, they suffer from several stringent assumptions such as requiring fully homogeneous agents, full observability of the environment, and centralized learning settings, that prevent their wide application in practical environments. In this dissertation, we extend mean field methods to environments having heterogeneous agents, and partially observable settings. Further, we extend mean field methods to include decentralized approaches. We provide novel mean field based MARL algorithms that outperform previous methods on a set of large games with many agents. Theoretically, we provide bounds on the information loss experienced as a result of using the mean field and further provide fixed point guarantees for Q-learning-based algorithms in each of these environments.
Subsequently, we combine our work in mean field learning and learning from advisors to show that we can achieve powerful MARL algorithms that are more suitable for real world environments as compared to prior approaches. This method uses the recently introduced attention mechanism to perform per-agent modelling of others in the locality, in addition to using the mean field for global responses. Notably, in this dissertation, we show applications in several real world multi-agent environments such as the Ising model, the ride-pool matching problem, and the massively multi-player online (MMO) game setting (which is currently a multi-billion dollar market)
- …