19 research outputs found
On the Foundation of Distributionally Robust Reinforcement Learning
Motivated by the need for a robust policy in the face of environment shifts
between training and the deployment, we contribute to the theoretical
foundation of distributionally robust reinforcement learning (DRRL). This is
accomplished through a comprehensive modeling framework centered around
distributionally robust Markov decision processes (DRMDPs). This framework
obliges the decision maker to choose an optimal policy under the worst-case
distributional shift orchestrated by an adversary. By unifying and extending
existing formulations, we rigorously construct DRMDPs that embraces various
modeling attributes for both the decision maker and the adversary. These
attributes include adaptability granularity, exploring history-dependent,
Markov, and Markov time-homogeneous decision maker and adversary dynamics.
Additionally, we delve into the flexibility of shifts induced by the adversary,
examining SA and S-rectangularity. Within this DRMDP framework, we investigate
conditions for the existence or absence of the dynamic programming principle
(DPP). From an algorithmic standpoint, the existence of DPP holds significant
implications, as the vast majority of existing data and computationally
efficiency RL algorithms are reliant on the DPP. To study its existence, we
comprehensively examine combinations of controller and adversary attributes,
providing streamlined proofs grounded in a unified methodology. We also offer
counterexamples for settings in which a DPP with full generality is absent
Distributionally Robust Model-based Reinforcement Learning with Large State Spaces
Three major challenges in reinforcement learning are the complex dynamical
systems with large state spaces, the costly data acquisition processes, and the
deviation of real-world dynamics from the training environment deployment. To
overcome these issues, we study distributionally robust Markov decision
processes with continuous state spaces under the widely used Kullback-Leibler,
chi-square, and total variation uncertainty sets. We propose a model-based
approach that utilizes Gaussian Processes and the maximum variance reduction
algorithm to efficiently learn multi-output nominal transition dynamics,
leveraging access to a generative model (i.e., simulator). We further
demonstrate the statistical sample complexity of the proposed method for
different uncertainty sets. These complexity bounds are independent of the
number of states and extend beyond linear dynamics, ensuring the effectiveness
of our approach in identifying near-optimal distributionally-robust policies.
The proposed method can be further combined with other model-free
distributionally robust reinforcement learning methods to obtain a near-optimal
robust policy. Experimental results demonstrate the robustness of our algorithm
to distributional shifts and its superior performance in terms of the number of
samples needed
Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation
Robustness has been extensively studied in reinforcement learning (RL) to
handle various forms of uncertainty such as random perturbations, rare events,
and malicious attacks. In this work, we consider one critical type of
robustness against spurious correlation, where different portions of the state
do not have correlations induced by unobserved confounders. These spurious
correlations are ubiquitous in real-world tasks, for instance, a self-driving
car usually observes heavy traffic in the daytime and light traffic at night
due to unobservable human activity. A model that learns such useless or even
harmful correlation could catastrophically fail when the confounder in the test
case deviates from the training one. Although motivated, enabling robustness
against spurious correlation poses significant challenges since the uncertainty
set, shaped by the unobserved confounder and causal structure, is difficult to
characterize and identify. Existing robust algorithms that assume simple and
unstructured uncertainty sets are therefore inadequate to address this
challenge. To solve this issue, we propose Robust State-Confounded Markov
Decision Processes (RSC-MDPs) and theoretically demonstrate its superiority in
avoiding learning spurious correlations compared with other robust RL
counterparts. We also design an empirical algorithm to learn the robust optimal
policy for RSC-MDPs, which outperforms all baselines in eight realistic
self-driving and manipulation tasks.Comment: Accepted to NeurIPS 202
Sample Complexity of Variance-reduced Distributionally Robust Q-learning
Dynamic decision making under distributional shifts is of fundamental
interest in theory and applications of reinforcement learning: The distribution
of the environment on which the data is collected can differ from that of the
environment on which the model is deployed. This paper presents two novel
model-free algorithms, namely the distributionally robust Q-learning and its
variance-reduced counterpart, that can effectively learn a robust policy
despite distributional shifts. These algorithms are designed to efficiently
approximate the -function of an infinite-horizon -discounted robust
Markov decision process with Kullback-Leibler uncertainty set to an entry-wise
-degree of precision. Further, the variance-reduced distributionally
robust Q-learning combines the synchronous Q-learning with variance-reduction
techniques to enhance its performance. Consequently, we establish that it
attains a minmax sample complexity upper bound of , where and denote the state and
action spaces. This is the first complexity result that is independent of the
uncertainty size , thereby providing new complexity theoretic insights.
Additionally, a series of numerical experiments confirm the theoretical
findings and the efficiency of the algorithms in handling distributional
shifts