3,588 research outputs found
Hierarchical Linearly-Solvable Markov Decision Problems
We present a hierarchical reinforcement learning framework that formulates
each task in the hierarchy as a special type of Markov decision process for
which the Bellman equation is linear and has analytical solution. Problems of
this type, called linearly-solvable MDPs (LMDPs) have interesting properties
that can be exploited in a hierarchical setting, such as efficient learning of
the optimal value function or task compositionality. The proposed hierarchical
approach can also be seen as a novel alternative to solving LMDPs with large
state spaces. We derive a hierarchical version of the so-called Z-learning
algorithm that learns different tasks simultaneously and show empirically that
it significantly outperforms the state-of-the-art learning methods in two
classical hierarchical reinforcement learning domains: the taxi domain and an
autonomous guided vehicle task.Comment: 11 pages, 6 figures, 26th International Conference on Automated
Planning and Schedulin
Optimal Ensemble Control of Loads in Distribution Grids with Network Constraints
Flexible loads, e.g. thermostatically controlled loads (TCLs), are
technically feasible to participate in demand response (DR) programs. On the
other hand, there is a number of challenges that need to be resolved before it
can be implemented in practice en masse. First, individual TCLs must be
aggregated and operated in sync to scale DR benefits. Second, the uncertainty
of TCLs needs to be accounted for. Third, exercising the flexibility of TCLs
needs to be coordinated with distribution system operations to avoid
unnecessary power losses and compliance with power flow and voltage limits.
This paper addresses these challenges. We propose a network-constrained,
open-loop, stochastic optimal control formulation. The first part of this
formulation represents ensembles of collocated TCLs modelled by an aggregated
Markov Process (MP), where each MP state is associated with a given power
consumption or production level. The second part extends MPs to a multi-period
distribution power flow optimization. In this optimization, the control of TCL
ensembles is regulated by transition probability matrices and physically
enabled by local active and reactive power controls at TCL locations. The
optimization is solved with a Spatio-Temporal Dual Decomposition (ST-D2)
algorithm. The performance of the proposed formulation and algorithm is
demonstrated on the IEEE 33-bus distribution model.Comment: 7 pages, 6 figures, accepted PSCC 201
Linearly Solvable Stochastic Control Lyapunov Functions
This paper presents a new method for synthesizing stochastic control Lyapunov
functions for a class of nonlinear stochastic control systems. The technique
relies on a transformation of the classical nonlinear Hamilton-Jacobi-Bellman
partial differential equation to a linear partial differential equation for a
class of problems with a particular constraint on the stochastic forcing. This
linear partial differential equation can then be relaxed to a linear
differential inclusion, allowing for relaxed solutions to be generated using
sum of squares programming. The resulting relaxed solutions are in fact
viscosity super/subsolutions, and by the maximum principle are pointwise upper
and lower bounds to the underlying value function, even for coarse polynomial
approximations. Furthermore, the pointwise upper bound is shown to be a
stochastic control Lyapunov function, yielding a method for generating
nonlinear controllers with pointwise bounded distance from the optimal cost
when using the optimal controller. These approximate solutions may be computed
with non-increasing error via a hierarchy of semidefinite optimization
problems. Finally, this paper develops a-priori bounds on trajectory
suboptimality when using these approximate value functions, as well as
demonstrates that these methods, and bounds, can be applied to a more general
class of nonlinear systems not obeying the constraint on stochastic forcing.
Simulated examples illustrate the methodology.Comment: Published in SIAM Journal of Control and Optimizatio
Nonparametric Infinite Horizon Kullback-Leibler Stochastic Control
We present two nonparametric approaches to Kullback-Leibler (KL) control, or
linearly-solvable Markov decision problem (LMDP) based on Gaussian processes
(GP) and Nystr\"{o}m approximation. Compared to recently developed parametric
methods, the proposed data-driven frameworks feature accurate function
approximation and efficient on-line operations. Theoretically, we derive the
mathematical connection of KL control based on dynamic programming with earlier
work in control theory which relies on information theoretic dualities for the
infinite time horizon case. Algorithmically, we give explicit optimal control
policies in nonparametric forms, and propose on-line update schemes with
budgeted computational costs. Numerical results demonstrate the effectiveness
and usefulness of the proposed frameworks
Optimal Navigation Functions for Nonlinear Stochastic Systems
This paper presents a new methodology to craft navigation functions for
nonlinear systems with stochastic uncertainty. The method relies on the
transformation of the Hamilton-Jacobi-Bellman (HJB) equation into a linear
partial differential equation. This approach allows for optimality criteria to
be incorporated into the navigation function, and generalizes several existing
results in navigation functions. It is shown that the HJB and that existing
navigation functions in the literature sit on ends of a spectrum of
optimization problems, upon which tradeoffs may be made in problem complexity.
In particular, it is shown that under certain criteria the optimal navigation
function is related to Laplace's equation, previously used in the literature,
through an exponential transform. Further, analytical solutions to the HJB are
available in simplified domains, yielding guidance towards optimality for
approximation schemes. Examples are used to illustrate the role that noise, and
optimality can potentially play in navigation system design.Comment: Accepted to IROS 2014. 8 Page
Linear Hamilton Jacobi Bellman Equations in High Dimensions
The Hamilton Jacobi Bellman Equation (HJB) provides the globally optimal
solution to large classes of control problems. Unfortunately, this generality
comes at a price, the calculation of such solutions is typically intractible
for systems with more than moderate state space size due to the curse of
dimensionality. This work combines recent results in the structure of the HJB,
and its reduction to a linear Partial Differential Equation (PDE), with methods
based on low rank tensor representations, known as a separated representations,
to address the curse of dimensionality. The result is an algorithm to solve
optimal control problems which scales linearly with the number of states in a
system, and is applicable to systems that are nonlinear with stochastic forcing
in finite-horizon, average cost, and first-exit settings. The method is
demonstrated on inverted pendulum, VTOL aircraft, and quadcopter models, with
system dimension two, six, and twelve respectively.Comment: 8 pages. Accepted to CDC 201
Ergodic Control and Polyhedral approaches to PageRank Optimization
We study a general class of PageRank optimization problems which consist in
finding an optimal outlink strategy for a web site subject to design
constraints. We consider both a continuous problem, in which one can choose the
intensity of a link, and a discrete one, in which in each page, there are
obligatory links, facultative links and forbidden links. We show that the
continuous problem, as well as its discrete variant when there are no
constraints coupling different pages, can both be modeled by constrained Markov
decision processes with ergodic reward, in which the webmaster determines the
transition probabilities of websurfers. Although the number of actions turns
out to be exponential, we show that an associated polytope of transition
measures has a concise representation, from which we deduce that the continuous
problem is solvable in polynomial time, and that the same is true for the
discrete problem when there are no coupling constraints. We also provide
efficient algorithms, adapted to very large networks. Then, we investigate the
qualitative features of optimal outlink strategies, and identify in particular
assumptions under which there exists a "master" page to which all controlled
pages should point. We report numerical results on fragments of the real web
graph.Comment: 39 page
- …