4 research outputs found
No-Regret Reinforcement Learning with Value Function Approximation: a Kernel Embedding Approach
We consider the regret minimization problem in reinforcement learning (RL) in
the episodic setting. In many real-world RL environments, the state and action
spaces are continuous or very large. Existing approaches establish regret
guarantees by either a low-dimensional representation of the stochastic
transition model or an approximation of the -functions. However, the
understanding of function approximation schemes for state-value functions
largely remains missing. In this paper, we propose an online model-based RL
algorithm, namely the CME-RL, that learns representations of transition
distributions as embeddings in a reproducing kernel Hilbert space while
carefully balancing the exploitation-exploration tradeoff. We demonstrate the
efficiency of our algorithm by proving a frequentist (worst-case) regret bound
that is of order , where is the
episode length, is the total number of time steps and is an
information theoretic quantity relating the effective dimension of the
state-action feature space. Our method bypasses the need for estimating
transition probabilities and applies to any domain on which kernels can be
defined. It also brings new insights into the general theory of kernel methods
for approximate inference and RL regret minimization