54,871 research outputs found
Accelerated Point-wise Maximum Approach to Approximate Dynamic Programming
We describe an approximate dynamic programming approach to compute lower
bounds on the optimal value function for a discrete time, continuous space,
infinite horizon setting. The approach iteratively constructs a family of lower
bounding approximate value functions by using the so-called Bellman inequality.
The novelty of our approach is that, at each iteration, we aim to compute an
approximate value function that maximizes the point-wise maximum taken with the
family of approximate value functions computed thus far. This leads to a
non-convex objective, and we propose a gradient ascent algorithm to find
stationary points by solving a sequence of convex optimization problems. We
provide convergence guarantees for our algorithm and an interpretation for how
the gradient computation relates to the state relevance weighting parameter
appearing in related approximate dynamic programming approaches. We demonstrate
through numerical examples that, when compared to existing approaches, the
algorithm we propose computes tighter sub-optimality bounds with less
computation time.Comment: 14 pages, 3 figure
Nonlinear Control of Quadcopters via Approximate Dynamic Programming
While Approximate Dynamic Programming has successfully been used in many
applications involving discrete states and inputs such as playing the games of
Tetris or chess, it has not been used in many continuous state and input space
applications. In this paper, we combine Approximate Dynamic Programming
techniques and apply them to the continuous, non-linear and high dimensional
dynamics of a quadcopter vehicle. We use a polynomial approximation of the
dynamics and sum-of-squares programming techniques to compute a family of
polynomial value function approximations for different tuning parameters. The
resulting approximations to the optimal value function are combined in a
point-wise maximum approach, which is used to compute the online policy. The
success of the method is demonstrated in both simulations and experiments on a
quadcopter. The control performance is compared to a linear time-varying Model
Predictive Controller. The two methods are then combined to keep the
computational benefits of a short horizon MPC and the long term performance
benefits of the Approximate Dynamic Programming value function as the terminal
cost.Comment: 8 pages, 9 figure
A Moment and Sum-of-Squares Extension of Dual Dynamic Programming with Application to Nonlinear Energy Storage Problems
We present a finite-horizon optimization algorithm that extends the
established concept of Dual Dynamic Programming (DDP) in two ways. First, in
contrast to the linear costs, dynamics, and constraints of standard DDP, we
consider problems in which all of these can be polynomial functions. Second, we
allow the state trajectory to be described by probability distributions rather
than point values, and return approximate value functions fitted to these. The
algorithm is in part an adaptation of sum-of-squares techniques used in the
approximate dynamic programming literature. It alternates between a forward
simulation through the horizon, in which the moments of the state distribution
are propagated through a succession of single-stage problems, and a backward
recursion, in which a new polynomial function is derived for each stage using
the moments of the state as fixed data. The value function approximation
returned for a given stage is the point-wise maximum of all polynomials derived
for that stage. This contrasts with the piecewise affine functions derived in
conventional DDP. We prove key convergence properties of the new algorithm, and
validate it in simulation on two case studies related to the optimal operation
of energy storage devices with nonlinear characteristics. The first is a small
borehole storage problem, for which multiple value function approximations can
be compared. The second is a larger problem, for which conventional discretized
dynamic programming is intractable.Comment: 33 pages, 9 figure
Performance guarantees for model-based Approximate Dynamic Programming in continuous spaces
We study both the value function and Q-function formulation of the Linear
Programming approach to Approximate Dynamic Programming. The approach is
model-based and optimizes over a restricted function space to approximate the
value function or Q-function. Working in the discrete time, continuous space
setting, we provide guarantees for the fitting error and online performance of
the policy. In particular, the online performance guarantee is obtained by
analyzing an iterated version of the greedy policy, and the fitting error
guarantee by analyzing an iterated version of the Bellman inequality. These
guarantees complement the existing bounds that appear in the literature. The
Q-function formulation offers benefits, for example, in decentralized
controller design, however it can lead to computationally demanding
optimization problems. To alleviate this drawback, we provide a condition that
simplifies the formulation, resulting in improved computational times.Comment: 18 pages, 5 figures, journal pape
Structure-Aware Stochastic Control for Transmission Scheduling
In this paper, we consider the problem of real-time transmission scheduling
over time-varying channels. We first formulate the transmission scheduling
problem as a Markov decision process (MDP) and systematically unravel the
structural properties (e.g. concavity in the state-value function and
monotonicity in the optimal scheduling policy) exhibited by the optimal
solutions. We then propose an online learning algorithm which preserves these
structural properties and achieves -optimal solutions for an arbitrarily small
. The advantages of the proposed online method are that: (i) it does not
require a priori knowledge of the traffic arrival and channel statistics and
(ii) it adaptively approximates the state-value functions using piece-wise
linear functions and has low storage and computation complexity. We also extend
the proposed low-complexity online learning solution to the prioritized data
transmission. The simulation results demonstrate that the proposed method
achieves significantly better utility (or delay)-energy trade-offs when
comparing to existing state-of-art online optimization methods.Comment: 41page
An optimal control approach of day-to-day congestion pricing for stochastic transportation networks
Congestion pricing has become an effective instrument for traffic demand
management on road networks. This paper proposes an optimal control approach
for congestion pricing for day-to-day timescale that incorporates demand
uncertainty and elasticity. Travelers make the decision to travel or not based
on the experienced system travel time in the previous day and traffic managers
take tolling decisions in order to minimize the average system travel time over
a long time horizon. We formulate the problem as a Markov decision process
(MDP) and analyze the problem to see if it satisfies conditions for conducting
a satisfactory solution analysis. Such an analysis of MDPs is often dependent
on the type of state space as well as on the boundedness of travel time
functions. We do not constrain the travel time functions to be bounded and
present an analysis centered around weighted sup-norm contractions that also
holds for unbounded travel time functions. We find that the formulated MDP
satisfies a set of assumptions to ensure Bellman's optimality condition.
Through this result, the existence of the optimal average cost of the MDP is
shown. A method based on approximate dynamic programming is proposed to resolve
the implementation and computational issues of solving the control problem.
Numerical results suggest that the proposed method efficiently solves the
problem and produces accurate solutions
Analysis of extremum value theorems for function spaces in optimal control under numerical uncertainty
The extremum value theorem for function spaces plays the central role in
optimal control. It is known that computation of optimal control actions and
policies is often prone to numerical errors which may be related to
computability issues. The current work addresses a version of the extremum
value theorem for function spaces under explicit consideration of numerical
uncertainties. It is shown that certain function spaces are bounded in a
suitable sense i.e. they admit finite approximations up to an arbitrary
precision. The proof of this fact is constructive in the sense that it
explicitly builds the approximating functions. Consequently, existence of
approximate extremal functions is shown. Applicability of the theorem is
investigated for finite--horizon optimal control, dynamic programming and
adaptive dynamic programming. Some possible computability issues of the
extremum value theorem in optimal control are shown on counterexamplesComment: 28 page
Optimized and Trusted Collision Avoidance for Unmanned Aerial Vehicles using Approximate Dynamic Programming (Technical Report)
Safely integrating unmanned aerial vehicles into civil airspace is contingent
upon development of a trustworthy collision avoidance system. This paper
proposes an approach whereby a parameterized resolution logic that is
considered trusted for a given range of its parameters is adaptively tuned
online. Specifically, to address the potential conservatism of the resolution
logic with static parameters, we present a dynamic programming approach for
adapting the parameters dynamically based on the encounter state. We compute
the adaptation policy offline using a simulation-based approximate dynamic
programming method that accommodates the high dimensionality of the problem.
Numerical experiments show that this approach improves safety and operational
performance compared to the baseline resolution logic, while retaining
trustworthiness.Comment: An abbreviated version was submitted to ICRA 201
Vector Autoregressive POMDP Model Learning and Planning for Human-Robot Collaboration
Human-robot collaboration (HRC) has emerged as a hot research area at the
intersection of control, robotics, and psychology in recent years. It is of
critical importance to obtain an expressive but meanwhile tractable model for
human beings in HRC. In this paper, we propose a model called Vector
Autoregressive POMDP (VAR-POMDP) model which is an extension of the traditional
POMDP model by considering the correlation among observations. The VAR-POMDP
model is more powerful in the expressiveness of features than the traditional
continuous observation POMDP since the traditional one is a special case of the
VAR-POMDP model. Meanwhile, the proposed VAR-POMDP model is also tractable, as
we show that it can be effectively learned from data and we can extend
point-based value iteration (PBVI) to VAR-POMDP planning. Particularly, in this
paper, we propose to use the Bayesian non-parametric learning to decide
potential human states and learn a VAR-POMDP model using data collected from
human demonstrations. Then, we consider planning with respect to PCTL which is
widely used as safety and reachability requirement in robotics. Finally, the
advantage of using the proposed model for HRC is validated by experimental
results using data collected from a driver-assistance test-bed
Feedback control of parametrized PDEs via model order reduction and dynamic programming principle
In this paper we investigate infinite horizon optimal control problems for
parametrized partial differential equations. We are interested in feedback
control via dynamic programming equations which is well-known to suffer from
the curse of dimensionality. Thus, we apply parametric model order reduction
techniques to construct low-dimensional subspaces with suitable information on
the control problem, where the dynamic programming equations can be
approximated. To guarantee a low number of basis functions, we combine recent
basis generation methods and parameter partitioning techniques. Furthermore, we
present a novel technique to construct nonuniform grids in the reduced domain,
which is based on statistical information. Finally, we discuss numerical
examples to illustrate the effectiveness of the proposed methods for PDEs in
two space dimensions
- …