60,046 research outputs found
Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes
Approximate dynamic programming has been used successfully in a large variety
of domains, but it relies on a small set of provided approximation features to
calculate solutions reliably. Large and rich sets of features can cause
existing algorithms to overfit because of a limited number of samples. We
address this shortcoming using regularization in approximate linear
programming. Because the proposed method can automatically select the
appropriate richness of features, its performance does not degrade with an
increasing number of features. These results rely on new and stronger sampling
bounds for regularized approximate linear programs. We also propose a
computationally efficient homotopy method. The empirical evaluation of the
approach shows that the proposed method performs well on simple MDPs and
standard benchmark problems.Comment: Technical report corresponding to the ICML2010 submission of the same
nam
Linear Programming for Large-Scale Markov Decision Problems
We consider the problem of controlling a Markov decision process (MDP) with a
large state space, so as to minimize average cost. Since it is intractable to
compete with the optimal policy for large scale problems, we pursue the more
modest goal of competing with a low-dimensional family of policies. We use the
dual linear programming formulation of the MDP average cost problem, in which
the variable is a stationary distribution over state-action pairs, and we
consider a neighborhood of a low-dimensional subset of the set of stationary
distributions (defined in terms of state-action features) as the comparison
class. We propose two techniques, one based on stochastic convex optimization,
and one based on constraint sampling. In both cases, we give bounds that show
that the performance of our algorithms approaches the best achievable by any
policy in the comparison class. Most importantly, these results depend on the
size of the comparison class, but not on the size of the state space.
Preliminary experiments show the effectiveness of the proposed algorithms in a
queuing application.Comment: 27 pages, 3 figure
Cover Tree Bayesian Reinforcement Learning
This paper proposes an online tree-based Bayesian approach for reinforcement
learning. For inference, we employ a generalised context tree model. This
defines a distribution on multivariate Gaussian piecewise-linear models, which
can be updated in closed form. The tree structure itself is constructed using
the cover tree method, which remains efficient in high dimensional spaces. We
combine the model with Thompson sampling and approximate dynamic programming to
obtain effective exploration policies in unknown environments. The flexibility
and computational simplicity of the model render it suitable for many
reinforcement learning problems in continuous state spaces. We demonstrate this
in an experimental comparison with least squares policy iteration
- …