18,296 research outputs found
A survey of random processes with reinforcement
The models surveyed include generalized P\'{o}lya urns, reinforced random
walks, interacting urn models, and continuous reinforced processes. Emphasis is
on methods and results, with sketches provided of some proofs. Applications are
discussed in statistics, biology, economics and a number of other areas.Comment: Published at http://dx.doi.org/10.1214/07-PS094 in the Probability
Surveys (http://www.i-journals.org/ps/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Sample Efficient Policy Search for Optimal Stopping Domains
Optimal stopping problems consider the question of deciding when to stop an
observation-generating process in order to maximize a return. We examine the
problem of simultaneously learning and planning in such domains, when data is
collected directly from the environment. We propose GFSE, a simple and flexible
model-free policy search method that reuses data for sample efficiency by
leveraging problem structure. We bound the sample complexity of our approach to
guarantee uniform convergence of policy value estimates, tightening existing
PAC bounds to achieve logarithmic dependence on horizon length for our setting.
We also examine the benefit of our method against prevalent model-based and
model-free approaches on 3 domains taken from diverse fields.Comment: To appear in IJCAI-201
Selecting Near-Optimal Approximate State Representations in Reinforcement Learning
We consider a reinforcement learning setting introduced in (Maillard et al.,
NIPS 2011) where the learner does not have explicit access to the states of the
underlying Markov decision process (MDP). Instead, she has access to several
models that map histories of past interactions to states. Here we improve over
known regret bounds in this setting, and more importantly generalize to the
case where the models given to the learner do not contain a true model
resulting in an MDP representation but only approximations of it. We also give
improved error bounds for state aggregation
- …