Search CORE

6 research outputs found

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Author: Kong Dingwen
Salakhutdinov Ruslan
Wang Ruosong
Yang Lin F.
Publication venue
Publication date: 14/06/2021
Field of study

Designing provably efficient algorithms with general function approximation is an important open problem in reinforcement learning. Recently, Wang et al.~[2020c] establish a value-based algorithm with general function approximation that enjoys

\widetilde{O}(\mathrm{poly}(dH)\sqrt{K})

\footnote{Throughout the paper, we use

\widetilde{O}(\cdot)

to suppress logarithm factors. } regret bound, where

d

depends on the complexity of the function class,

H

is the planning horizon, and

K

is the total number of episodes. However, their algorithm requires

\Omega(K)

computation time per round, rendering the algorithm inefficient for practical use. In this paper, by applying online sub-sampling techniques, we develop an algorithm that takes

\widetilde{O}(\mathrm{poly}(dH))

computation time per round on average, and enjoys nearly the same regret bound. Furthermore, the algorithm achieves low switching cost, i.e., it changes the policy only

\widetilde{O}(\mathrm{poly}(dH))

times during its execution, making it appealing to be implemented in real-life scenarios. Moreover, by using an upper-confidence based exploration-driven reward function, the algorithm provably explores the environment in the reward-free setting. In particular, after

\widetilde{O}(\mathrm{poly}(dH))/\epsilon^2

rounds of exploration, the algorithm outputs an

\epsilon

-optimal policy for any given reward function

arXiv.org e-Print Archive