6 research outputs found
Online Sub-Sampling for Reinforcement Learning with General Function Approximation
Designing provably efficient algorithms with general function approximation
is an important open problem in reinforcement learning. Recently, Wang et
al.~[2020c] establish a value-based algorithm with general function
approximation that enjoys
\footnote{Throughout the paper, we
use to suppress logarithm factors. } regret bound, where
depends on the complexity of the function class, is the planning
horizon, and is the total number of episodes. However, their algorithm
requires computation time per round, rendering the algorithm
inefficient for practical use. In this paper, by applying online sub-sampling
techniques, we develop an algorithm that takes
computation time per round on average, and
enjoys nearly the same regret bound. Furthermore, the algorithm achieves low
switching cost, i.e., it changes the policy only
times during its execution, making it
appealing to be implemented in real-life scenarios. Moreover, by using an
upper-confidence based exploration-driven reward function, the algorithm
provably explores the environment in the reward-free setting. In particular,
after rounds of exploration, the
algorithm outputs an -optimal policy for any given reward function