186,691 research outputs found
Approximate Dynamic Programming via Sum of Squares Programming
We describe an approximate dynamic programming method for stochastic control
problems on infinite state and input spaces. The optimal value function is
approximated by a linear combination of basis functions with coefficients as
decision variables. By relaxing the Bellman equation to an inequality, one
obtains a linear program in the basis coefficients with an infinite set of
constraints. We show that a recently introduced method, which obtains convex
quadratic value function approximations, can be extended to higher order
polynomial approximations via sum of squares programming techniques. An
approximate value function can then be computed offline by solving a
semidefinite program, without having to sample the infinite constraint. The
policy is evaluated online by solving a polynomial optimization problem, which
also turns out to be convex in some cases. We experimentally validate the
method on an autonomous helicopter testbed using a 10-dimensional helicopter
model.Comment: 7 pages, 5 figures. Submitted to the 2013 European Control
Conference, Zurich, Switzerlan
Approximate Dynamic Programming with Gaussian Processes
In general, it is difficult to determine an optimal closed-loop policy in nonlinear control problems with continuous-valued state and control domains. Hence, approximations are often inevitable. The standard method of discretizing states and controls suffers from the curse of dimensionality and strongly depends on the chosen temporal sampling rate. In this paper, we introduce Gaussian process dynamic programming (GPDP) and determine an approximate globally optimal closed-loop policy. In GPDP, value functions in the Bellman recursion of the dynamic programming algorithm are modeled using Gaussian processes. GPDP returns an optimal statefeedback for a finite set of states. Based on these outcomes, we learn a possibly discontinuous closed-loop policy on the entire state space by switching between two independently trained Gaussian processes. A binary classifier selects one Gaussian process to predict the optimal control signal. We show that GPDP is able to yield an almost optimal solution to an LQ problem using few sample points. Moreover, we successfully apply GPDP to the underpowered pendulum swing up, a complex nonlinear control problem
Adaptive signal control using approximate dynamic programming
This paper presents a concise summary of a study on adaptive traffic signal controller for real time operation. The adaptive controller is designed to achieve three operational objectives: first, the controller adopts a dual control principle to achieve a balanced influence between immediate cost and long-term cost in operation; second, controller switches signals without referring to a preset plan and is acyclic; third, controller adjusts its parameters online to adapt new environment. Not all of these features are available in existing operational controllers. Although dynamic programming (DP) is the only exact solution for achieving the operational objectives, it is usually impractical for real time operation because of demand in computation and information. To circumvent the difficulties, we use approximate dynamic programming (ADP) in conjunction with online learning techniques. This approach can substantially reduce computational burden by replacing the exact value function of DP with a continuous linear approximation function, which is then updated progressively by online learning techniques. Two online learning techniques, which are reinforcement learning and monotonicity approximation respectively, are investigated. We find in computer simulation that the ADP controller leads to substantial savings in vehicle delays in comparison with optimised fixed-time plans. The implications of this study to traffic control are: the ADP controller meet all of the three operational objectives with competitive results, and can be readily implemented for operations at both isolated intersection and traffic networks; the ADP algorithm is computationally efficient, and the ADP controller is an evolving system that requires minimum human intervention; the ADP technique offers a flexible theoretical framework in which a range of functional forms and learning techniques can be further studied
Adaptive traffic signal control using approximate dynamic programming
This paper presents a study on an adaptive traffic signal controller for real-time operation. The controller aims for three operational objectives: dynamic allocation of green time, automatic adjustment to control parameters, and fast revision of signal plans. The control algorithm is built on approximate dynamic programming (ADP). This approach substantially reduces computational burden by using an approximation to the value function of the dynamic programming and reinforcement learning to update the approximation. We investigate temporal-difference learning and perturbation learning as specific learning techniques for the ADP approach. We find in computer simulation that the ADP controllers achieve substantial reduction in vehicle delays in comparison with optimised fixed-time plans. Our results show that substantial benefits can be gained by increasing the frequency at which the signal plans are revised, which can be achieved conveniently using the ADP approach
A New Optimal Stepsize For Approximate Dynamic Programming
Approximate dynamic programming (ADP) has proven itself in a wide range of
applications spanning large-scale transportation problems, health care, revenue
management, and energy systems. The design of effective ADP algorithms has many
dimensions, but one crucial factor is the stepsize rule used to update a value
function approximation. Many operations research applications are
computationally intensive, and it is important to obtain good results quickly.
Furthermore, the most popular stepsize formulas use tunable parameters and can
produce very poor results if tuned improperly. We derive a new stepsize rule
that optimizes the prediction error in order to improve the short-term
performance of an ADP algorithm. With only one, relatively insensitive tunable
parameter, the new rule adapts to the level of noise in the problem and
produces faster convergence in numerical experiments.Comment: Matlab files are included with the paper sourc
- …