20,817 research outputs found
Sample Efficient Policy Search for Optimal Stopping Domains
Optimal stopping problems consider the question of deciding when to stop an
observation-generating process in order to maximize a return. We examine the
problem of simultaneously learning and planning in such domains, when data is
collected directly from the environment. We propose GFSE, a simple and flexible
model-free policy search method that reuses data for sample efficiency by
leveraging problem structure. We bound the sample complexity of our approach to
guarantee uniform convergence of policy value estimates, tightening existing
PAC bounds to achieve logarithmic dependence on horizon length for our setting.
We also examine the benefit of our method against prevalent model-based and
model-free approaches on 3 domains taken from diverse fields.Comment: To appear in IJCAI-201
Cover Tree Bayesian Reinforcement Learning
This paper proposes an online tree-based Bayesian approach for reinforcement
learning. For inference, we employ a generalised context tree model. This
defines a distribution on multivariate Gaussian piecewise-linear models, which
can be updated in closed form. The tree structure itself is constructed using
the cover tree method, which remains efficient in high dimensional spaces. We
combine the model with Thompson sampling and approximate dynamic programming to
obtain effective exploration policies in unknown environments. The flexibility
and computational simplicity of the model render it suitable for many
reinforcement learning problems in continuous state spaces. We demonstrate this
in an experimental comparison with least squares policy iteration
Solving Factored MDPs with Hybrid State and Action Variables
Efficient representations and solutions for large decision problems with
continuous and discrete variables are among the most important challenges faced
by the designers of automated decision support systems. In this paper, we
describe a novel hybrid factored Markov decision process (MDP) model that
allows for a compact representation of these problems, and a new hybrid
approximate linear programming (HALP) framework that permits their efficient
solutions. The central idea of HALP is to approximate the optimal value
function by a linear combination of basis functions and optimize its weights by
linear programming. We analyze both theoretical and computational aspects of
this approach, and demonstrate its scale-up potential on several hybrid
optimization problems
- …