Search CORE

54,871 research outputs found

Accelerated Point-wise Maximum Approach to Approximate Dynamic Programming

Author: Beuchat Paul N.
Lygeros John
Warrington Joseph
Publication venue
Publication date: 15/03/2019
Field of study

We describe an approximate dynamic programming approach to compute lower bounds on the optimal value function for a discrete time, continuous space, infinite horizon setting. The approach iteratively constructs a family of lower bounding approximate value functions by using the so-called Bellman inequality. The novelty of our approach is that, at each iteration, we aim to compute an approximate value function that maximizes the point-wise maximum taken with the family of approximate value functions computed thus far. This leads to a non-convex objective, and we propose a gradient ascent algorithm to find stationary points by solving a sequence of convex optimization problems. We provide convergence guarantees for our algorithm and an interpretation for how the gradient computation relates to the state relevance weighting parameter appearing in related approximate dynamic programming approaches. We demonstrate through numerical examples that, when compared to existing approaches, the algorithm we propose computes tighter sub-optimality bounds with less computation time.Comment: 14 pages, 3 figure

arXiv.org e-Print Archive

Nonlinear Control of Quadcopters via Approximate Dynamic Programming

Author: Beuchat Paul N.
Lygeros John
Romero Angel
Smith Roy S.
Stürz Yvonne R.
Publication venue
Publication date: 18/02/2019
Field of study

While Approximate Dynamic Programming has successfully been used in many applications involving discrete states and inputs such as playing the games of Tetris or chess, it has not been used in many continuous state and input space applications. In this paper, we combine Approximate Dynamic Programming techniques and apply them to the continuous, non-linear and high dimensional dynamics of a quadcopter vehicle. We use a polynomial approximation of the dynamics and sum-of-squares programming techniques to compute a family of polynomial value function approximations for different tuning parameters. The resulting approximations to the optimal value function are combined in a point-wise maximum approach, which is used to compute the online policy. The success of the method is demonstrated in both simulations and experiments on a quadcopter. The control performance is compared to a linear time-varying Model Predictive Controller. The two methods are then combined to keep the computational benefits of a short horizon MPC and the long term performance benefits of the Approximate Dynamic Programming value function as the terminal cost.Comment: 8 pages, 9 figure

arXiv.org e-Print Archive

A Moment and Sum-of-Squares Extension of Dual Dynamic Programming with Application to Nonlinear Energy Storage Problems

Author: Hohmann Marc
Lygeros John
Warrington Joseph
Publication venue
Publication date: 16/07/2018
Field of study

We present a finite-horizon optimization algorithm that extends the established concept of Dual Dynamic Programming (DDP) in two ways. First, in contrast to the linear costs, dynamics, and constraints of standard DDP, we consider problems in which all of these can be polynomial functions. Second, we allow the state trajectory to be described by probability distributions rather than point values, and return approximate value functions fitted to these. The algorithm is in part an adaptation of sum-of-squares techniques used in the approximate dynamic programming literature. It alternates between a forward simulation through the horizon, in which the moments of the state distribution are propagated through a succession of single-stage problems, and a backward recursion, in which a new polynomial function is derived for each stage using the moments of the state as fixed data. The value function approximation returned for a given stage is the point-wise maximum of all polynomials derived for that stage. This contrasts with the piecewise affine functions derived in conventional DDP. We prove key convergence properties of the new algorithm, and validate it in simulation on two case studies related to the optimal operation of energy storage devices with nonlinear characteristics. The first is a small borehole storage problem, for which multiple value function approximations can be compared. The second is a larger problem, for which conventional discretized dynamic programming is intractable.Comment: 33 pages, 9 figure

arXiv.org e-Print Archive

Performance guarantees for model-based Approximate Dynamic Programming in continuous spaces

Author: Beuchat Paul N.
Georghiou Angelos
Lygeros John
Publication venue
Publication date: 30/08/2018
Field of study

We study both the value function and Q-function formulation of the Linear Programming approach to Approximate Dynamic Programming. The approach is model-based and optimizes over a restricted function space to approximate the value function or Q-function. Working in the discrete time, continuous space setting, we provide guarantees for the fitting error and online performance of the policy. In particular, the online performance guarantee is obtained by analyzing an iterated version of the greedy policy, and the fitting error guarantee by analyzing an iterated version of the Bellman inequality. These guarantees complement the existing bounds that appear in the literature. The Q-function formulation offers benefits, for example, in decentralized controller design, however it can lead to computationally demanding optimization problems. To alleviate this drawback, we provide a condition that simplifies the formulation, resulting in improved computational times.Comment: 18 pages, 5 figures, journal pape

arXiv.org e-Print Archive

Structure-Aware Stochastic Control for Transmission Scheduling

Author: Fu Fangwen
van der Schaar Mihaela
Publication venue
Publication date: 11/03/2010
Field of study

In this paper, we consider the problem of real-time transmission scheduling over time-varying channels. We first formulate the transmission scheduling problem as a Markov decision process (MDP) and systematically unravel the structural properties (e.g. concavity in the state-value function and monotonicity in the optimal scheduling policy) exhibited by the optimal solutions. We then propose an online learning algorithm which preserves these structural properties and achieves -optimal solutions for an arbitrarily small . The advantages of the proposed online method are that: (i) it does not require a priori knowledge of the traffic arrival and channel statistics and (ii) it adaptively approximates the state-value functions using piece-wise linear functions and has low storage and computation complexity. We also extend the proposed low-complexity online learning solution to the prioritized data transmission. The simulation results demonstrate that the proposed method achieves significantly better utility (or delay)-energy trade-offs when comparing to existing state-of-art online optimization methods.Comment: 41page

arXiv.org e-Print Archive

An optimal control approach of day-to-day congestion pricing for stochastic transportation networks

Author: Gehlot Hemant
Honnappa Harsha
Ukkusuri Satish V.
Publication venue
Publication date: 20/07/2019
Field of study

Congestion pricing has become an effective instrument for traffic demand management on road networks. This paper proposes an optimal control approach for congestion pricing for day-to-day timescale that incorporates demand uncertainty and elasticity. Travelers make the decision to travel or not based on the experienced system travel time in the previous day and traffic managers take tolling decisions in order to minimize the average system travel time over a long time horizon. We formulate the problem as a Markov decision process (MDP) and analyze the problem to see if it satisfies conditions for conducting a satisfactory solution analysis. Such an analysis of MDPs is often dependent on the type of state space as well as on the boundedness of travel time functions. We do not constrain the travel time functions to be bounded and present an analysis centered around weighted sup-norm contractions that also holds for unbounded travel time functions. We find that the formulated MDP satisfies a set of assumptions to ensure Bellman's optimality condition. Through this result, the existence of the optimal average cost of the MDP is shown. A method based on approximate dynamic programming is proposed to resolve the implementation and computational issues of solving the control problem. Numerical results suggest that the proposed method efficiently solves the problem and produces accurate solutions

arXiv.org e-Print Archive

Analysis of extremum value theorems for function spaces in optimal control under numerical uncertainty

Author: Osinenko Pavel
Streif Stefan
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/06/2018
Field of study

The extremum value theorem for function spaces plays the central role in optimal control. It is known that computation of optimal control actions and policies is often prone to numerical errors which may be related to computability issues. The current work addresses a version of the extremum value theorem for function spaces under explicit consideration of numerical uncertainties. It is shown that certain function spaces are bounded in a suitable sense i.e. they admit finite approximations up to an arbitrary precision. The proof of this fact is constructive in the sense that it explicitly builds the approximating functions. Consequently, existence of approximate extremal functions is shown. Applicability of the theorem is investigated for finite--horizon optimal control, dynamic programming and adaptive dynamic programming. Some possible computability issues of the extremum value theorem in optimal control are shown on counterexamplesComment: 28 page

arXiv.org e-Print Archive

Optimized and Trusted Collision Avoidance for Unmanned Aerial Vehicles using Approximate Dynamic Programming (Technical Report)

Author: Kochenderfer Mykel J.
Pavone Marco
Sunberg Zachary N.
Publication venue
Publication date: 18/02/2016
Field of study

Safely integrating unmanned aerial vehicles into civil airspace is contingent upon development of a trustworthy collision avoidance system. This paper proposes an approach whereby a parameterized resolution logic that is considered trusted for a given range of its parameters is adaptively tuned online. Specifically, to address the potential conservatism of the resolution logic with static parameters, we present a dynamic programming approach for adapting the parameters dynamically based on the encounter state. We compute the adaptation policy offline using a simulation-based approximate dynamic programming method that accommodates the high dimensionality of the problem. Numerical experiments show that this approach improves safety and operational performance compared to the baseline resolution logic, while retaining trustworthiness.Comment: An abbreviated version was submitted to ICRA 201

arXiv.org e-Print Archive

Vector Autoregressive POMDP Model Learning and Planning for Human-Robot Collaboration

Author: Lin Hai
Zheng Wei
Publication venue
Publication date: 28/04/2019
Field of study

Human-robot collaboration (HRC) has emerged as a hot research area at the intersection of control, robotics, and psychology in recent years. It is of critical importance to obtain an expressive but meanwhile tractable model for human beings in HRC. In this paper, we propose a model called Vector Autoregressive POMDP (VAR-POMDP) model which is an extension of the traditional POMDP model by considering the correlation among observations. The VAR-POMDP model is more powerful in the expressiveness of features than the traditional continuous observation POMDP since the traditional one is a special case of the VAR-POMDP model. Meanwhile, the proposed VAR-POMDP model is also tractable, as we show that it can be effectively learned from data and we can extend point-based value iteration (PBVI) to VAR-POMDP planning. Particularly, in this paper, we propose to use the Bayesian non-parametric learning to decide potential human states and learn a VAR-POMDP model using data collected from human demonstrations. Then, we consider planning with respect to PCTL which is widely used as safety and reachability requirement in robotics. Finally, the advantage of using the proposed model for HRC is validated by experimental results using data collected from a driver-assistance test-bed

arXiv.org e-Print Archive

Feedback control of parametrized PDEs via model order reduction and dynamic programming principle

Author: Alla Alessandro
Haasdonk Bernard
Schmidt Andreas
Publication venue
Publication date: 28/09/2018
Field of study

In this paper we investigate infinite horizon optimal control problems for parametrized partial differential equations. We are interested in feedback control via dynamic programming equations which is well-known to suffer from the curse of dimensionality. Thus, we apply parametric model order reduction techniques to construct low-dimensional subspaces with suitable information on the control problem, where the dynamic programming equations can be approximated. To guarantee a low number of basis functions, we combine recent basis generation methods and parameter partitioning techniques. Furthermore, we present a novel technique to construct nonuniform grids in the reduced domain, which is based on statistical information. Finally, we discuss numerical examples to illustrate the effectiveness of the proposed methods for PDEs in two space dimensions

arXiv.org e-Print Archive