6,792 research outputs found
An Information-Theoretic Analysis of Thompson Sampling
We provide an information-theoretic analysis of Thompson sampling that
applies across a broad range of online optimization problems in which a
decision-maker must learn from partial feedback. This analysis inherits the
simplicity and elegance of information theory and leads to regret bounds that
scale with the entropy of the optimal-action distribution. This strengthens
preexisting results and yields new insight into how information improves
performance
Model-based Reinforcement Learning and the Eluder Dimension
We consider the problem of learning to optimize an unknown Markov decision
process (MDP). We show that, if the MDP can be parameterized within some known
function class, we can obtain regret bounds that scale with the dimensionality,
rather than cardinality, of the system. We characterize this dependence
explicitly as where is time elapsed, is
the Kolmogorov dimension and is the \emph{eluder dimension}. These
represent the first unified regret bounds for model-based reinforcement
learning and provide state of the art guarantees in several important settings.
Moreover, we present a simple and computationally efficient algorithm
\emph{posterior sampling for reinforcement learning} (PSRL) that satisfies
these bounds
- β¦