Search CORE

9 research outputs found

A General Framework for Learning Mean-Field Games

Author: Guo Xin
Hu Anran
Xu Renyuan
Zhang Junzi
Publication venue
Publication date: 10/10/2021
Field of study

This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population. It first establishes the existence of a unique Nash Equilibrium to this GMFG, and demonstrates that naively combining reinforcement learning with the fixed-point approach in classical MFGs yields unstable algorithms. It then proposes value-based and policy-based reinforcement learning algorithms (GMF-V and GMF-P, respectively) with smoothed policies, with analysis of their convergence properties and computational complexities. Experiments on an equilibrium product pricing problem demonstrate that GMF-V-Q and GMF-P-TRPO, two specific instantiations of GMF-V and GMF-P, respectively, with Q-learning and TRPO, are both efficient and robust in the GMFG setting. Moreover, their performance is superior in convergence speed, accuracy, and stability when compared with existing algorithms for multi-agent reinforcement learning in the

N

-player setting.Comment: 43 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1901.0958

arXiv.org e-Print Archive

Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability

Author: Cen Zhepeng
Ding Wenhao
Huang Peide
Li Bo
Liu Zuxin
Xu Mengdi
Zhao Ding
Publication venue
Publication date: 16/09/2022
Field of study

A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems, including {robustly} handling uncertainties, satisfying {safety} constraints to avoid catastrophic failures, and {generalizing} to unseen scenarios during deployments. This study aims to overview these main perspectives of trustworthy reinforcement learning considering its intrinsic vulnerabilities on robustness, safety, and generalizability. In particular, we give rigorous formulations, categorize corresponding methodologies, and discuss benchmarks for each perspective. Moreover, we provide an outlook section to spur promising future directions with a brief discussion on extrinsic vulnerabilities considering human feedback. We hope this survey could bring together separate threads of studies together in a unified framework and promote the trustworthiness of reinforcement learning.Comment: 36 pages, 5 figure

arXiv.org e-Print Archive

Sinkhorn Distributionally Robust Optimization

Author: Gao Rui
Wang Jie
Xie Yao
Publication venue
Publication date: 27/05/2023
Field of study

We study distributionally robust optimization (DRO) with Sinkhorn distance -- a variant of Wasserstein distance based on entropic regularization. We derive convex programming dual reformulation for a general nominal distribution. Compared with Wasserstein DRO, it is computationally tractable for a larger class of loss functions, and its worst-case distribution is more reasonable for practical applications. To solve the dual reformulation, we develop a stochastic mirror descent algorithm using biased gradient oracles and analyze its convergence rate. Finally, we provide numerical examples using synthetic and real data to demonstrate its superior performance.Comment: 56 pages, 8 figure

arXiv.org e-Print Archive

A General Framework for Optimal Data-Driven Optimization

Author: Kuhn Daniel
Sutter Tobias
Van Parys Bart P. G.
Publication venue
Publication date: 21/02/2021
Field of study

We propose a statistically optimal approach to construct data-driven decisions for stochastic optimization problems. Fundamentally, a data-driven decision is simply a function that maps the available training data to a feasible action. It can always be expressed as the minimizer of a surrogate optimization model constructed from the data. The quality of a data-driven decision is measured by its out-of-sample risk. An additional quality measure is its out-of-sample disappointment, which we define as the probability that the out-of-sample risk exceeds the optimal value of the surrogate optimization model. An ideal data-driven decision should minimize the out-of-sample risk simultaneously with respect to every conceivable probability measure as the true measure is unkown. Unfortunately, such ideal data-driven decisions are generally unavailable. This prompts us to seek data-driven decisions that minimize the out-of-sample risk subject to an upper bound on the out-of-sample disappointment. We prove that such Pareto-dominant data-driven decisions exist under conditions that allow for interesting applications: the unknown data-generating probability measure must belong to a parametric ambiguity set, and the corresponding parameters must admit a sufficient statistic that satisfies a large deviation principle. We can further prove that the surrogate optimization model must be a distributionally robust optimization problem constructed from the sufficient statistic and the rate function of its large deviation principle. Hence the optimal method for mapping data to decisions is to solve a distributionally robust optimization model. Maybe surprisingly, this result holds even when the training data is non-i.i.d. Our analysis reveals how the structural properties of the data-generating stochastic process impact the shape of the ambiguity set underlying the optimal distributionally robust model.Comment: 52 page

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Robust Reinforcement Learning: A Review of Foundations and Recent Advances

Author: Abdulsamad Hany
Clever Debora
Hansel Kay
Moos Janosch
Peters Jan
Stark Svenja
Publication venue: MDPI
Publication date: 01/01/2022
Field of study

Reinforcement learning (RL) has become a highly successful framework for learning in Markov decision processes (MDP). Due to the adoption of RL in realistic and complex environments, solution robustness becomes an increasingly important aspect of RL deployment. Nevertheless, current RL algorithms struggle with robustness to uncertainty, disturbances, or structural changes in the environment. We survey the literature on robust approaches to reinforcement learning and categorize these methods in four different ways: (i) Transition robust designs account for uncertainties in the system dynamics by manipulating the transition probabilities between states; (ii) Disturbance robust designs leverage external forces to model uncertainty in the system behavior; (iii) Action robust designs redirect transitions of the system by corrupting an agent’s output; (iv) Observation robust designs exploit or distort the perceived system state of the policy. Each of these robust designs alters a different aspect of the MDP. Additionally, we address the connection of robustness to the risk-based and entropy-regularized RL formulations. The resulting survey covers all fundamental concepts underlying the approaches to robust reinforcement learning and their recent advances

TUbiblio

tuprints

Frameworks and Results in Distributionally Robust Optimization

Author: Hamed Rahimian
Sanjay Mehrotra
Publication venue: 'Cellule MathDoc/CEDRAM'
Publication date: 01/01/2022
Field of study

Numérisation de Documents Anciens Mathématiques

Open Journal of Mathematical Optimization