3,934 research outputs found
Nonparametric Infinite Horizon Kullback-Leibler Stochastic Control
We present two nonparametric approaches to Kullback-Leibler (KL) control, or
linearly-solvable Markov decision problem (LMDP) based on Gaussian processes
(GP) and Nystr\"{o}m approximation. Compared to recently developed parametric
methods, the proposed data-driven frameworks feature accurate function
approximation and efficient on-line operations. Theoretically, we derive the
mathematical connection of KL control based on dynamic programming with earlier
work in control theory which relies on information theoretic dualities for the
infinite time horizon case. Algorithmically, we give explicit optimal control
policies in nonparametric forms, and propose on-line update schemes with
budgeted computational costs. Numerical results demonstrate the effectiveness
and usefulness of the proposed frameworks
Passive Dynamics in Mean Field Control
Mean-field models are a popular tool in a variety of fields. They provide an
understanding of the impact of interactions among a large number of particles
or people or other "self-interested agents", and are an increasingly popular
tool in distributed control.
This paper considers a particular randomized distributed control architecture
introduced in our own recent work. In numerical results it was found that the
associated mean-field model had attractive properties for purposes of control.
In particular, when viewed as an input-output system, its linearization was
found to be minimum phase.
In this paper we take a closer look at the control model. The results are
summarized as follows:
(i) The Markov Decision Process framework of Todorov is extended to
continuous time models, in which the "control cost" is based on relative
entropy. This is the basis of the construction of a family of controlled
Markovian generators.
(ii) A decentralized control architecture is proposed in which each agent
evolves as a controlled Markov process. A central authority broadcasts a common
control signal to each agent. The central authority chooses this signal based
on an aggregate scalar output of the Markovian agents.
(iii) Provided the control-free system is a reversible Markov process, the
following identity holds for the linearization, where the right hand side
denotes the power spectral density for the output of any one of the individual
(control-free) Markov processes.Comment: To appear IEEE CDC, 201
Action-Constrained Markov Decision Processes With Kullback-Leibler Cost
This paper concerns computation of optimal policies in which the one-step
reward function contains a cost term that models Kullback-Leibler divergence
with respect to nominal dynamics. This technique was introduced by Todorov in
2007, where it was shown under general conditions that the solution to the
average-reward optimality equations reduce to a simple eigenvector problem.
Since then many authors have sought to apply this technique to control problems
and models of bounded rationality in economics.
A crucial assumption is that the input process is essentially unconstrained.
For example, if the nominal dynamics include randomness from nature (e.g., the
impact of wind on a moving vehicle), then the optimal control solution does not
respect the exogenous nature of this disturbance.
This paper introduces a technique to solve a more general class of
action-constrained MDPs. The main idea is to solve an entire parameterized
family of MDPs, in which the parameter is a scalar weighting the one-step
reward function. The approach is new and practical even in the original
unconstrained formulation
Optimal Ensemble Control of Loads in Distribution Grids with Network Constraints
Flexible loads, e.g. thermostatically controlled loads (TCLs), are
technically feasible to participate in demand response (DR) programs. On the
other hand, there is a number of challenges that need to be resolved before it
can be implemented in practice en masse. First, individual TCLs must be
aggregated and operated in sync to scale DR benefits. Second, the uncertainty
of TCLs needs to be accounted for. Third, exercising the flexibility of TCLs
needs to be coordinated with distribution system operations to avoid
unnecessary power losses and compliance with power flow and voltage limits.
This paper addresses these challenges. We propose a network-constrained,
open-loop, stochastic optimal control formulation. The first part of this
formulation represents ensembles of collocated TCLs modelled by an aggregated
Markov Process (MP), where each MP state is associated with a given power
consumption or production level. The second part extends MPs to a multi-period
distribution power flow optimization. In this optimization, the control of TCL
ensembles is regulated by transition probability matrices and physically
enabled by local active and reactive power controls at TCL locations. The
optimization is solved with a Spatio-Temporal Dual Decomposition (ST-D2)
algorithm. The performance of the proposed formulation and algorithm is
demonstrated on the IEEE 33-bus distribution model.Comment: 7 pages, 6 figures, accepted PSCC 201
- …