9 research outputs found
A General Framework for Learning Mean-Field Games
This paper presents a general mean-field game (GMFG) framework for
simultaneous learning and decision-making in stochastic games with a large
population. It first establishes the existence of a unique Nash Equilibrium to
this GMFG, and demonstrates that naively combining reinforcement learning with
the fixed-point approach in classical MFGs yields unstable algorithms. It then
proposes value-based and policy-based reinforcement learning algorithms (GMF-V
and GMF-P, respectively) with smoothed policies, with analysis of their
convergence properties and computational complexities. Experiments on an
equilibrium product pricing problem demonstrate that GMF-V-Q and GMF-P-TRPO,
two specific instantiations of GMF-V and GMF-P, respectively, with Q-learning
and TRPO, are both efficient and robust in the GMFG setting. Moreover, their
performance is superior in convergence speed, accuracy, and stability when
compared with existing algorithms for multi-agent reinforcement learning in the
-player setting.Comment: 43 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1901.0958
Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability
A trustworthy reinforcement learning algorithm should be competent in solving
challenging real-world problems, including {robustly} handling uncertainties,
satisfying {safety} constraints to avoid catastrophic failures, and
{generalizing} to unseen scenarios during deployments. This study aims to
overview these main perspectives of trustworthy reinforcement learning
considering its intrinsic vulnerabilities on robustness, safety, and
generalizability. In particular, we give rigorous formulations, categorize
corresponding methodologies, and discuss benchmarks for each perspective.
Moreover, we provide an outlook section to spur promising future directions
with a brief discussion on extrinsic vulnerabilities considering human
feedback. We hope this survey could bring together separate threads of studies
together in a unified framework and promote the trustworthiness of
reinforcement learning.Comment: 36 pages, 5 figure
Sinkhorn Distributionally Robust Optimization
We study distributionally robust optimization (DRO) with Sinkhorn distance --
a variant of Wasserstein distance based on entropic regularization. We derive
convex programming dual reformulation for a general nominal distribution.
Compared with Wasserstein DRO, it is computationally tractable for a larger
class of loss functions, and its worst-case distribution is more reasonable for
practical applications. To solve the dual reformulation, we develop a
stochastic mirror descent algorithm using biased gradient oracles and analyze
its convergence rate. Finally, we provide numerical examples using synthetic
and real data to demonstrate its superior performance.Comment: 56 pages, 8 figure
A General Framework for Optimal Data-Driven Optimization
We propose a statistically optimal approach to construct data-driven
decisions for stochastic optimization problems. Fundamentally, a data-driven
decision is simply a function that maps the available training data to a
feasible action. It can always be expressed as the minimizer of a surrogate
optimization model constructed from the data. The quality of a data-driven
decision is measured by its out-of-sample risk. An additional quality measure
is its out-of-sample disappointment, which we define as the probability that
the out-of-sample risk exceeds the optimal value of the surrogate optimization
model. An ideal data-driven decision should minimize the out-of-sample risk
simultaneously with respect to every conceivable probability measure as the
true measure is unkown. Unfortunately, such ideal data-driven decisions are
generally unavailable. This prompts us to seek data-driven decisions that
minimize the out-of-sample risk subject to an upper bound on the out-of-sample
disappointment. We prove that such Pareto-dominant data-driven decisions exist
under conditions that allow for interesting applications: the unknown
data-generating probability measure must belong to a parametric ambiguity set,
and the corresponding parameters must admit a sufficient statistic that
satisfies a large deviation principle. We can further prove that the surrogate
optimization model must be a distributionally robust optimization problem
constructed from the sufficient statistic and the rate function of its large
deviation principle. Hence the optimal method for mapping data to decisions is
to solve a distributionally robust optimization model. Maybe surprisingly, this
result holds even when the training data is non-i.i.d. Our analysis reveals how
the structural properties of the data-generating stochastic process impact the
shape of the ambiguity set underlying the optimal distributionally robust
model.Comment: 52 page
Robust Reinforcement Learning: A Review of Foundations and Recent Advances
Reinforcement learning (RL) has become a highly successful framework for learning in Markov decision processes (MDP). Due to the adoption of RL in realistic and complex environments, solution robustness becomes an increasingly important aspect of RL deployment. Nevertheless, current RL algorithms struggle with robustness to uncertainty, disturbances, or structural changes in the environment. We survey the literature on robust approaches to reinforcement learning and categorize these methods in four different ways: (i) Transition robust designs account for uncertainties in the system dynamics by manipulating the transition probabilities between states; (ii) Disturbance robust designs leverage external forces to model uncertainty in the system behavior; (iii) Action robust designs redirect transitions of the system by corrupting an agent’s output; (iv) Observation robust designs exploit or distort the perceived system state of the policy. Each of these robust designs alters a different aspect of the MDP. Additionally, we address the connection of robustness to the risk-based and entropy-regularized RL formulations. The resulting survey covers all fundamental concepts underlying the approaches to robust reinforcement learning and their recent advances