36 research outputs found
A General Framework for Learning Mean-Field Games
This paper presents a general mean-field game (GMFG) framework for
simultaneous learning and decision-making in stochastic games with a large
population. It first establishes the existence of a unique Nash Equilibrium to
this GMFG, and demonstrates that naively combining reinforcement learning with
the fixed-point approach in classical MFGs yields unstable algorithms. It then
proposes value-based and policy-based reinforcement learning algorithms (GMF-V
and GMF-P, respectively) with smoothed policies, with analysis of their
convergence properties and computational complexities. Experiments on an
equilibrium product pricing problem demonstrate that GMF-V-Q and GMF-P-TRPO,
two specific instantiations of GMF-V and GMF-P, respectively, with Q-learning
and TRPO, are both efficient and robust in the GMFG setting. Moreover, their
performance is superior in convergence speed, accuracy, and stability when
compared with existing algorithms for multi-agent reinforcement learning in the
-player setting.Comment: 43 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1901.0958
Sample Efficient Reinforcement Learning with REINFORCE
Policy gradient methods are among the most effective methods for large-scale
reinforcement learning, and their empirical success has prompted several works
that develop the foundation of their global convergence theory. However, prior
works have either required exact gradients or state-action visitation measure
based mini-batch stochastic gradients with a diverging batch size, which limit
their applicability in practical scenarios. In this paper, we consider
classical policy gradient methods that compute an approximate gradient with a
single trajectory or a fixed size mini-batch of trajectories under soft-max
parametrization and log-barrier regularization, along with the widely-used
REINFORCE gradient estimation procedure. By controlling the number of "bad"
episodes and resorting to the classical doubling trick, we establish an anytime
sub-linear high probability regret bound as well as almost sure global
convergence of the average regret with an asymptotically sub-linear rate. These
provide the first set of global convergence and sample efficiency results for
the well-known REINFORCE algorithm and contribute to a better understanding of
its performance in practice.Comment: Accepted to AAAI 2021. Fixed typos in constants and enriched the
literature revie
MFGLib: A Library for Mean-Field Games
Mean-field games (MFGs) are limiting models to approximate -player games,
with a number of applications. Despite the ever-growing numerical literature on
computation of MFGs, there is no library that allows researchers and
practitioners to easily create and solve their own MFG problems. The purpose of
this document is to introduce MFGLib, an open-source Python library for solving
general MFGs with a user-friendly and customizable interface. It serves as a
handy tool for creating and analyzing generic MFG environments, along with
embedded auto-tuners for all implemented algorithms. The package is distributed
under the MIT license and the source code and documentation can be found at
https://github.com/radar-research-lab/MFGLib/
The Contagion Effect of Compensation Regulation: Evidence From China
To shed light on whether and how firms changed compensation practices in response to a shift in the environment in which they operated, we examine whether there is contagion effect of executive compensation regulation on state-owned enterprises (SOEs) in the emerging market of China. Specifically, we investigate whether firms not directly affected by the changing regulatory environment nonetheless changed executive compensation in response to the actions of the directly affected firms, which is called contagion effect. We further examine the specific contagion mechanisms and the economic consequences of regulation on compensation. We find that the regulation has a significant effect on compensation gap in central SOEs and a contagion effect on local SOEs but not for non-SOEs. Within SOEs, there is an intra-industry contagion effect of compensation regulation but not an intra-region effect. Further, central SOEs and local SOEs experience reduced firm performance after the compensation regulations, but not the non-SOEs; indicating that the compensation regulation does not have favorable economic consequences for both the directly affected central SOEs and the indirectly affected local SOEs
Cu^{2+}-Chelating Mesoporous Silica Nanoparticles for Synergistic Chemotherapy/Chemodynamic Therapy
In this study, a pH-responsive controlled-release mesoporous silica nanoparticle (MSN) formulation was developed. The MSNs were functionalized with a histidine (His)-tagged targeting peptide (B3int) through an amide bond, and loaded with an anticancer drug (cisplatin (CP)) and a lysosomal destabilization mediator (chloroquine (CQ)). Cu2+ was then used to seal the pores of the MSNs via chelation with the His-tag. The resultant nanoparticles showed pH-responsive drug release, and could effectively target tumor cells via the targeting effect of B3int. The presence of CP and Cu2+ permits reactive oxygen species to be generated inside cells; thus, the chemotherapeutic effect of CP is augmented by chemodynamic therapy. In vitro and in vivo experiments showed that the nanoparticles are able to effectively kill tumor cells. An in vivo cancer model revealed that the nanoparticles increase apoptosis in tumor cells, and thereby diminish the tumor volume. No off-target toxicity was noted. It thus appears that the functionalized MSNs developed in this work have great potential for targeted, synergistic anticancer therapies