109,364 research outputs found
Linear Programming for Large-Scale Markov Decision Problems
We consider the problem of controlling a Markov decision process (MDP) with a
large state space, so as to minimize average cost. Since it is intractable to
compete with the optimal policy for large scale problems, we pursue the more
modest goal of competing with a low-dimensional family of policies. We use the
dual linear programming formulation of the MDP average cost problem, in which
the variable is a stationary distribution over state-action pairs, and we
consider a neighborhood of a low-dimensional subset of the set of stationary
distributions (defined in terms of state-action features) as the comparison
class. We propose two techniques, one based on stochastic convex optimization,
and one based on constraint sampling. In both cases, we give bounds that show
that the performance of our algorithms approaches the best achievable by any
policy in the comparison class. Most importantly, these results depend on the
size of the comparison class, but not on the size of the state space.
Preliminary experiments show the effectiveness of the proposed algorithms in a
queuing application.Comment: 27 pages, 3 figure
Semidefinite Relaxations for Stochastic Optimal Control Policies
Recent results in the study of the Hamilton Jacobi Bellman (HJB) equation
have led to the discovery of a formulation of the value function as a linear
Partial Differential Equation (PDE) for stochastic nonlinear systems with a
mild constraint on their disturbances. This has yielded promising directions
for research in the planning and control of nonlinear systems. This work
proposes a new method obtaining approximate solutions to these linear
stochastic optimal control (SOC) problems. A candidate polynomial with variable
coefficients is proposed as the solution to the SOC problem. A Sum of Squares
(SOS) relaxation is then taken to the partial differential constraints, leading
to a hierarchy of semidefinite relaxations with improving sub-optimality gap.
The resulting approximate solutions are shown to be guaranteed over- and
under-approximations for the optimal value function.Comment: Preprint. Accepted to American Controls Conference (ACC) 2014 in
Portland, Oregon. 7 pages, colo
Rational inattention in control of Markov chains
This thesis poses a general model for optimal control subject to information
constraint, motivated in part by recent work on information-constrained
decision-making by economic agents.
In the average-cost optimal control framework, the general model introduced
in this paper reduces to a variant of the linear-programming representation
of the average-cost optimal control problem, subject to an additional
mutual information constraint on the randomized stationary policy. The resulting
in nite-dimensional convex program admits a decomposition based
on the Bellman error, which is the subject of study in approximate dynamic
programming.
Later, we apply the general theory to an information-constrained variant
of the scalar Linear-Quadratic-Gaussian (LQG) control problem. We give
an upper bound on the optimal steady-state value of the quadratic performance
objective and present explicit constructions of controllers that achieve
this bound. We show that the obvious certainty-equivalent control policy is
suboptimal when the information constraints are very severe, and propose
another policy that performs better in this low-information regime. In the
two extreme cases of no information (open-loop) and perfect information,
these two policies coincide with the optimum
Adaptive traffic signal control using approximate dynamic programming
This paper presents a study on an adaptive traffic signal controller for real-time operation. The controller aims for three operational objectives: dynamic allocation of green time, automatic adjustment to control parameters, and fast revision of signal plans. The control algorithm is built on approximate dynamic programming (ADP). This approach substantially reduces computational burden by using an approximation to the value function of the dynamic programming and reinforcement learning to update the approximation. We investigate temporal-difference learning and perturbation learning as specific learning techniques for the ADP approach. We find in computer simulation that the ADP controllers achieve substantial reduction in vehicle delays in comparison with optimised fixed-time plans. Our results show that substantial benefits can be gained by increasing the frequency at which the signal plans are revised, which can be achieved conveniently using the ADP approach
- …