10 research outputs found
Action-Constrained Markov Decision Processes With Kullback-Leibler Cost
This paper concerns computation of optimal policies in which the one-step
reward function contains a cost term that models Kullback-Leibler divergence
with respect to nominal dynamics. This technique was introduced by Todorov in
2007, where it was shown under general conditions that the solution to the
average-reward optimality equations reduce to a simple eigenvector problem.
Since then many authors have sought to apply this technique to control problems
and models of bounded rationality in economics.
A crucial assumption is that the input process is essentially unconstrained.
For example, if the nominal dynamics include randomness from nature (e.g., the
impact of wind on a moving vehicle), then the optimal control solution does not
respect the exogenous nature of this disturbance.
This paper introduces a technique to solve a more general class of
action-constrained MDPs. The main idea is to solve an entire parameterized
family of MDPs, in which the parameter is a scalar weighting the one-step
reward function. The approach is new and practical even in the original
unconstrained formulation
From Infinite to Finite Programs: Explicit Error Bounds with Applications to Approximate Dynamic Programming
We consider linear programming (LP) problems in infinite dimensional spaces
that are in general computationally intractable. Under suitable assumptions, we
develop an approximation bridge from the infinite-dimensional LP to tractable
finite convex programs in which the performance of the approximation is
quantified explicitly. To this end, we adopt the recent developments in two
areas of randomized optimization and first order methods, leading to a priori
as well as a posterior performance guarantees. We illustrate the generality and
implications of our theoretical results in the special case of the long-run
average cost and discounted cost optimal control problems for Markov decision
processes on Borel spaces. The applicability of the theoretical results is
demonstrated through a constrained linear quadratic optimal control problem and
a fisheries management problem.Comment: 30 pages, 5 figure
Rate-cost tradeoffs in control
Consider a distributed control problem with a communication channel connecting the observer of a linear stochastic system to the controller. The goal of the controller is minimize a quadratic cost function. The most basic special case of that cost function is the mean-square deviation of the system state from the desired state. We study the fundamental tradeoff between the communication rate r bits/sec and the limsup of the expected cost b, and show a lower bound on the rate necessary to attain b. The bound applies as long as the system noise has a probability density function. If target cost b is not too large, that bound can be closely approached by a simple lattice quantization scheme that only quantizes the innovation, that is, the difference between the controller's belief about the current state and the true state
Rate-Cost Tradeoffs in Control
Consider a control problem with a communication channel connecting the observer of a linear stochastic system to the controller. The goal of the controller is to minimize a quadratic cost function in the state variables and control signal, known as the linear quadratic regulator (LQR). We study the fundamental tradeoff between the communication rate r bits/sec and the expected cost b. We obtain a lower bound on a certain rate-cost function, which quantifies the minimum directed mutual information between the channel input and output that is compatible with a target LQR cost. The rate-cost function has operational significance in multiple scenarios of interest: among others, it allows us to lower-bound the minimum communication rate for fixed and variable length quantization, and for control over noisy channels. We derive an explicit lower bound to the rate-cost function, which applies to the vector, non-Gaussian, and partially observed systems, thereby extending and generalizing an earlier explicit expression for the scalar Gaussian system, due to Tatikonda el al. [2]. The bound applies as long as the differential entropy of the system noise is not ââ . It can be closely approached by a simple lattice quantization scheme that only quantizes the innovation, that is, the difference between the controller's belief about the current state and the true state. Via a separation principle between control and communication, similar results hold for causal lossy compression of additive noise Markov sources. Apart from standard dynamic programming arguments, our technical approach leverages the Shannon lower bound, develops new estimates for data compression with coding memory, and uses some recent results on high resolution variablelength vector quantization to prove that the new converse bounds are tight
Rate-cost tradeoffs in control
Consider a distributed control problem with a communication channel connecting the observer of a linear stochastic system to the controller. The goal of the controller is minimize a quadratic cost function. The most basic special case of that cost function is the mean-square deviation of the system state from the desired state. We study the fundamental tradeoff between the communication rate r bits/sec and the limsup of the expected cost b, and show a lower bound on the rate necessary to attain b. The bound applies as long as the system noise has a probability density function. If target cost b is not too large, that bound can be closely approached by a simple lattice quantization scheme that only quantizes the innovation, that is, the difference between the controller's belief about the current state and the true state