49,897 research outputs found

    Approximate Dynamic Programming via a Smoothed Linear Program

    Get PDF
    We present a novel linear program for the approximation of the dynamic programming cost-to-go function in high-dimensional stochastic control problems. LP approaches to approximate DP have typically relied on a natural “projection” of a well-studied linear program for exact dynamic programming. Such programs restrict attention to approximations that are lower bounds to the optimal cost-to-go function. Our program—the “smoothed approximate linear program”—is distinct from such approaches and relaxes the restriction to lower bounding approximations in an appropriate fashion while remaining computationally tractable. Doing so appears to have several advantages: First, we demonstrate bounds on the quality of approximation to the optimal cost-to-go function afforded by our approach. These bounds are, in general, no worse than those available for extant LP approaches and for specific problem instances can be shown to be arbitrarily stronger. Second, experiments with our approach on a pair of challenging problems (the game of Tetris and a queueing network control problem) show that the approach outperforms the existing LP approach (which has previously been shown to be competitive with several ADP algorithms) by a substantial margin

    Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes

    Full text link
    Approximate dynamic programming has been used successfully in a large variety of domains, but it relies on a small set of provided approximation features to calculate solutions reliably. Large and rich sets of features can cause existing algorithms to overfit because of a limited number of samples. We address this shortcoming using L1L_1 regularization in approximate linear programming. Because the proposed method can automatically select the appropriate richness of features, its performance does not degrade with an increasing number of features. These results rely on new and stronger sampling bounds for regularized approximate linear programs. We also propose a computationally efficient homotopy method. The empirical evaluation of the approach shows that the proposed method performs well on simple MDPs and standard benchmark problems.Comment: Technical report corresponding to the ICML2010 submission of the same nam

    Global Optimization for Value Function Approximation

    Full text link
    Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze both optimal and approximate algorithms for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. We also briefly analyze the behavior of bilinear programming algorithms under incomplete samples. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on simple benchmark problems

    From Infinite to Finite Programs: Explicit Error Bounds with Applications to Approximate Dynamic Programming

    Full text link
    We consider linear programming (LP) problems in infinite dimensional spaces that are in general computationally intractable. Under suitable assumptions, we develop an approximation bridge from the infinite-dimensional LP to tractable finite convex programs in which the performance of the approximation is quantified explicitly. To this end, we adopt the recent developments in two areas of randomized optimization and first order methods, leading to a priori as well as a posterior performance guarantees. We illustrate the generality and implications of our theoretical results in the special case of the long-run average cost and discounted cost optimal control problems for Markov decision processes on Borel spaces. The applicability of the theoretical results is demonstrated through a constrained linear quadratic optimal control problem and a fisheries management problem.Comment: 30 pages, 5 figure

    Constraint-based reachability

    Get PDF
    Iterative imperative programs can be considered as infinite-state systems computing over possibly unbounded domains. Studying reachability in these systems is challenging as it requires to deal with an infinite number of states with standard backward or forward exploration strategies. An approach that we call Constraint-based reachability, is proposed to address reachability problems by exploring program states using a constraint model of the whole program. The keypoint of the approach is to interpret imperative constructions such as conditionals, loops, array and memory manipulations with the fundamental notion of constraint over a computational domain. By combining constraint filtering and abstraction techniques, Constraint-based reachability is able to solve reachability problems which are usually outside the scope of backward or forward exploration strategies. This paper proposes an interpretation of classical filtering consistencies used in Constraint Programming as abstract domain computations, and shows how this approach can be used to produce a constraint solver that efficiently generates solutions for reachability problems that are unsolvable by other approaches.Comment: In Proceedings Infinity 2012, arXiv:1302.310
    • …
    corecore