Search CORE

152 research outputs found

Identifying Systemic Risk in Interbank Markets by Applying Network Theory

Author: Xu Zhuoran
Publication venue
Publication date: 23/04/2016
Field of study

OPUS

Feature Reduction Using Ensemble Approach

Author: Hou Cuiqin
Sun Jun
Xia Yingju
Xu Zhuoran
Publication venue
Publication date: 01/01/2015
Field of study

Waseda University Repository

Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

Author: Fu Zuyue
Kosorok Michael R.
Qi Zhengling
Wang Zhaoran
Xu Yanxun
Yang Zhuoran
Publication venue
Publication date: 18/09/2022
Field of study

We study the offline reinforcement learning (RL) in the face of unmeasured confounders. Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment. To tackle the above challenges, we study the policy learning in the confounded MDPs with the aid of instrumental variables. Specifically, we first establish value function (VF)-based and marginalized importance sampling (MIS)-based identification results for the expected total reward in the confounded MDPs. Then by leveraging pessimism and our identification results, we propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy under minimal data coverage and modeling assumptions. Lastly, our extensive theoretical investigations and one numerical study motivated by the kidney transplantation demonstrate the promising performance of the proposed methods

arXiv.org e-Print Archive

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

Author: Chan Victor Wai Kin
Jiang Li
Li Jianxiong
Wang Zhaoran
Xu Haoran
Yang Zhuoran
Zhan Xianyuan
Publication venue
Publication date: 28/03/2023
Field of study

Most offline reinforcement learning (RL) methods suffer from the trade-off between improving the policy to surpass the behavior policy and constraining the policy to limit the deviation from the behavior policy as computing

Q

-values using out-of-distribution (OOD) actions will suffer from errors due to distributional shift. The recently proposed \textit{In-sample Learning} paradigm (i.e., IQL), which improves the policy by quantile regression using only data samples, shows great promise because it learns an optimal policy without querying the value function of any unseen actions. However, it remains unclear how this type of method handles the distributional shift in learning the value function. In this work, we make a key finding that the in-sample learning paradigm arises under the \textit{Implicit Value Regularization} (IVR) framework. This gives a deeper understanding of why the in-sample learning paradigm works, i.e., it applies implicit value regularization to the policy. Based on the IVR framework, we further propose two practical algorithms, Sparse

Q

-learning (SQL) and Exponential

Q

-learning (EQL), which adopt the same value regularization used in existing works, but in a complete in-sample manner. Compared with IQL, we find that our algorithms introduce sparsity in learning the value function, making them more robust in noisy data regimes. We also verify the effectiveness of SQL and EQL on D4RL benchmark datasets and show the benefits of in-sample learning by comparing them with CQL in small data regimes.Comment: ICLR 2023 notable top 5

arXiv.org e-Print Archive