2,215 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
A Sampling-Based Method for Gittins Index Approximation
A sampling-based method is introduced to approximate the Gittins index for a
general family of alternative bandit processes. The approximation consists of a
truncation of the optimization horizon and support for the immediate rewards,
an optimal stopping value approximation, and a stochastic approximation
procedure. Finite-time error bounds are given for the three approximations,
leading to a procedure to construct a confidence interval for the Gittins index
using a finite number of Monte Carlo samples, as well as an epsilon-optimal
policy for the Bayesian multi-armed bandit. Proofs are given for almost sure
convergence and convergence in distribution for the sampling based Gittins
index approximation. In a numerical study, the approximation quality of the
proposed method is verified for the Bernoulli bandit and Gaussian bandit with
known variance, and the method is shown to significantly outperform Thompson
sampling and the Bayesian Upper Confidence Bound algorithms for a novel random
effects multi-armed bandit
- …