42,241 research outputs found
Learning to Control in Metric Space with Optimal Regret
We study online reinforcement learning for finite-horizon deterministic
control systems with {\it arbitrary} state and action spaces. Suppose that the
transition dynamics and reward function is unknown, but the state and action
space is endowed with a metric that characterizes the proximity between
different states and actions. We provide a surprisingly simple upper-confidence
reinforcement learning algorithm that uses a function approximation oracle to
estimate optimistic Q functions from experiences. We show that the regret of
the algorithm after episodes is where is a
smoothness parameter, and is the doubling dimension of the state-action
space with respect to the given metric. We also establish a near-matching
regret lower bound. The proposed method can be adapted to work for more
structured transition systems, including the finite-state case and the case
where value functions are linear combinations of features, where the method
also achieve the optimal regret
Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model
In this paper we consider the problem of computing an -optimal
policy of a discounted Markov Decision Process (DMDP) provided we can only
access its transition function through a generative sampling model that given
any state-action pair samples from the transition function in time.
Given such a DMDP with states , actions , discount factor
, and rewards in range we provide an algorithm which
computes an -optimal policy with probability where
\emph{both} the time spent and number of sample taken are upper bounded by For fixed values
of , this improves upon the previous best known bounds by a factor of
and matches the sample complexity lower bounds proved in
Azar et al. (2013) up to logarithmic factors. We also extend our method to
computing -optimal policies for finite-horizon MDP with a generative
model and provide a nearly matching sample complexity lower bound.Comment: 31 pages. Accepted to NeurIPS, 201
- …