124,469 research outputs found
Approximate Dynamic Programming via a Smoothed Linear Program
We present a novel linear program for the approximation of the dynamic programming cost-to-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural “projection” of a well-studied linear program for exact dynamic programming. Such programs restrict attention to approximations that are lower bounds to the optimal cost-to-go function. Our program—the “smoothed approximate linear program”—is distinct from such approaches and relaxes the restriction to lower bounding approximations in an appropriate fashion while remaining computationally tractable. Doing so appears to have several advantages: First, we demonstrate bounds on the quality of approximation to the optimal cost-to-go function afforded by our approach. These bounds are, in general, no worse than those available for extant LP approaches and for specific problem instances can be shown to be arbitrarily stronger. Second, experiments with our approach on a pair of challenging problems (the game of Tetris and a queueing network control problem) show that the approach outperforms the existing LP approach (which has previously been shown to be competitive with several ADP algorithms) by a substantial margin
A linear programming methodology for approximate dynamic programming
[EN] The linear programming (LP) approach to solve the Bellman equation in dynamic programming is a well-known option for
finite state and input spaces to obtain an exact solution. However, with function approximation or continuous state spaces,
refinements are necessary. This paper presents a methodology to make approximate dynamic programming via LP work
in practical control applications with continuous state and input spaces. There are some guidelines on data and regressor
choices needed to obtain meaningful and well-conditioned value function estimates. The work discusses the introduction of
terminal ingredients and computation of lower and upper bounds of the value function. An experimental inverted-pendulum
application will be used to illustrate the proposal and carry out a suitable comparative analysis with alternative options in
the literature.The authors are grateful for the financial support of the Spanish Ministry of Economy and the European Union, grant DPI2016-81002-R (AEI/FEDER, UE), and the PhD grant from the Government of Ecuador (SENESCYT).Diaz, H.; Sala, A.; Armesto Ángel, L. (2020). A linear programming methodology for approximate dynamic programming. International Journal of Applied Mathematics and Computer Science (Online). 30(2):363-375. https://doi.org/10.34768/amcs-2020-0028S36337530
Approximate Dynamic Programming via Sum of Squares Programming
We describe an approximate dynamic programming method for stochastic control
problems on infinite state and input spaces. The optimal value function is
approximated by a linear combination of basis functions with coefficients as
decision variables. By relaxing the Bellman equation to an inequality, one
obtains a linear program in the basis coefficients with an infinite set of
constraints. We show that a recently introduced method, which obtains convex
quadratic value function approximations, can be extended to higher order
polynomial approximations via sum of squares programming techniques. An
approximate value function can then be computed offline by solving a
semidefinite program, without having to sample the infinite constraint. The
policy is evaluated online by solving a polynomial optimization problem, which
also turns out to be convex in some cases. We experimentally validate the
method on an autonomous helicopter testbed using a 10-dimensional helicopter
model.Comment: 7 pages, 5 figures. Submitted to the 2013 European Control
Conference, Zurich, Switzerlan
The dynamic lot-sizing problem with convex economic production costs and setups
In this work the uncapacitated dynamic lot-sizing problem is considered. Demands are deterministic and production costs consist of convex costs that arise from economic production functions plus set-up costs. We formulate the problem as a mixed integer, non-linear programming problem and obtain structural results which are used to construct a forward dynamic-programming algorithm that obtains the optimal solution in polynomial time. For positive setup costs, the generic approaches are found to be prohibitively time-consuming; therefore we focus on approximate solution methods. The forward DP algorithm is modified via the conjunctive use of three rules for solution generation. Additionally, we propose six heuristics. Two of these are single-stepSilver–Meal and EOQ heuristics for the classical lot-sizing problem. The third is a variant of the Wagner–Whitin algorithm. The remaining three heuristics are two-step hybrids that improve on the initial solutions of the first three by exploiting the structural properties of optimal production subplans. The proposed algorithms are evaluated by an extensive numerical study. The two-step Wagner–Whitin algorithm turns out to be the best heuristic
Zap Q-Learning for Optimal Stopping Time Problems
The objective in this paper is to obtain fast converging reinforcement
learning algorithms to approximate solutions to the problem of discounted cost
optimal stopping in an irreducible, uniformly ergodic Markov chain, evolving on
a compact subset of . We build on the dynamic programming
approach taken by Tsitsikilis and Van Roy, wherein they propose a Q-learning
algorithm to estimate the optimal state-action value function, which then
defines an optimal stopping rule. We provide insights as to why the convergence
rate of this algorithm can be slow, and propose a fast-converging alternative,
the "Zap-Q-learning" algorithm, designed to achieve optimal rate of
convergence. For the first time, we prove the convergence of the Zap-Q-learning
algorithm under the assumption of linear function approximation setting. We use
ODE analysis for the proof, and the optimal asymptotic variance property of the
algorithm is reflected via fast convergence in a finance example
Dynamic Linear Discriminant Analysis in High Dimensional Space
High-dimensional data that evolve dynamically feature predominantly in the
modern data era. As a partial response to this, recent years have seen
increasing emphasis to address the dimensionality challenge. However, the
non-static nature of these datasets is largely ignored. This paper addresses
both challenges by proposing a novel yet simple dynamic linear programming
discriminant (DLPD) rule for binary classification. Different from the usual
static linear discriminant analysis, the new method is able to capture the
changing distributions of the underlying populations by modeling their means
and covariances as smooth functions of covariates of interest. Under an
approximate sparse condition, we show that the conditional misclassification
rate of the DLPD rule converges to the Bayes risk in probability uniformly over
the range of the variables used for modeling the dynamics, when the
dimensionality is allowed to grow exponentially with the sample size. The
minimax lower bound of the estimation of the Bayes risk is also established,
implying that the misclassification rate of our proposed rule is minimax-rate
optimal. The promising performance of the DLPD rule is illustrated via
extensive simulation studies and the analysis of a breast cancer dataset.Comment: 34 pages; 3 figure
- …