43,916 research outputs found
Linear Programming for Large-Scale Markov Decision Problems
We consider the problem of controlling a Markov decision process (MDP) with a
large state space, so as to minimize average cost. Since it is intractable to
compete with the optimal policy for large scale problems, we pursue the more
modest goal of competing with a low-dimensional family of policies. We use the
dual linear programming formulation of the MDP average cost problem, in which
the variable is a stationary distribution over state-action pairs, and we
consider a neighborhood of a low-dimensional subset of the set of stationary
distributions (defined in terms of state-action features) as the comparison
class. We propose two techniques, one based on stochastic convex optimization,
and one based on constraint sampling. In both cases, we give bounds that show
that the performance of our algorithms approaches the best achievable by any
policy in the comparison class. Most importantly, these results depend on the
size of the comparison class, but not on the size of the state space.
Preliminary experiments show the effectiveness of the proposed algorithms in a
queuing application.Comment: 27 pages, 3 figure
Solving Factored MDPs with Hybrid State and Action Variables
Efficient representations and solutions for large decision problems with
continuous and discrete variables are among the most important challenges faced
by the designers of automated decision support systems. In this paper, we
describe a novel hybrid factored Markov decision process (MDP) model that
allows for a compact representation of these problems, and a new hybrid
approximate linear programming (HALP) framework that permits their efficient
solutions. The central idea of HALP is to approximate the optimal value
function by a linear combination of basis functions and optimize its weights by
linear programming. We analyze both theoretical and computational aspects of
this approach, and demonstrate its scale-up potential on several hybrid
optimization problems
Planning in Hybrid Structured Stochastic Domains
Efficient representations and solutions for large structured decision problems with continuous and discrete variables are among the important challenges faced by the designers of automated decision support systems. In this work, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function of an MDP by a linear combination of basis functions and optimize its weights by linear programming. We study both theoretical and practical aspects of this approach, and demonstrate its scale-up potential on several hybrid optimization problems
A bounded actor-critic algorithm for reinforcement learning
This thesis presents a new actor-critic algorithm from the domain of reinforcement learning to solve Markov and semi-Markov decision processes (or problems) in the field of airline revenue management (ARM). The ARM problem is one of control optimization in which a decision-maker must accept or reject a customer based on a requested fare. This thesis focuses on the so-called single-leg version of the ARM problem, which can be cast as a semi-Markov decision process (SMDP). Large-scale Markov decision processes (MDPs) and SMDPs suffer from the curses of dimensionality and modeling, making it difficult to create the transition probability matrices (TPMs) necessary to solve them using traditional methods such as dynamic and linear programming. This thesis seeks to employ an actor-critic algorithm to overcome the challenges found in developing TPMs for large-scale real-world problems. Unlike traditional actor-critic algorithms, where the values of the so-called actor can either become very large or very small, the algorithm developed in this thesis has an updating mechanism that keeps the values of the actor s iterates bounded in the limit and significantly smaller in magnitude than previous actor-critic algorithms. This allows the algorithm to explore the state space fully and perform better than its traditional counterpart. Numerical experiments conducted show encouraging results with the new algorithm by delivering optimal results on small case MDPs and SMDPs and consistently outperforming an airline industry heuristic, namely EMSR-b, on large-scale ARM problems --Abstract, page iii
- …