Search CORE

89,213 research outputs found

Global Optimization for Value Function Approximation

Author: Petrik Marek
Zilberstein Shlomo
Publication venue
Publication date: 14/06/2010
Field of study

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bilinear programming formulation of value function approximation, which employs global optimization. The formulation provides strong a priori guarantees on both robust and expected policy loss by minimizing specific norms of the Bellman residual. Solving a bilinear program optimally is NP-hard, but this is unavoidable because the Bellman-residual minimization itself is NP-hard. We describe and analyze both optimal and approximate algorithms for solving bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. We also briefly analyze the behavior of bilinear programming algorithms under incomplete samples. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on simple benchmark problems

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst

Recommended from our members

Robust Estimation and Filtering in the Presence of Unknown but Bounded Noise

Author: Tempo Roberto
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1986
Field of study

In this paper optimal algorithms for robust estimation and filtering are constructed. No statistical assumption is supposed available or used and the noise is considered a deterministic variable unknown but bounded belonging to a set described by a norm. Previous results obtained for complete (one-to-one) and approximate information [1] are now extended to partial and approximate information. This information seems useful in dealing with dynamic systems not completely identifiable and/or with two different sources of noise, for example process and measurement noise. For different norms characterizing the noise, optimal algorithms (in a min-max sense) are shown. In particular for Hilbert norms a linear optimal algorithm is the well-known minimum variance estimator. For 1₀ ₀ and 1₁ norms optimal algorithms, computable by linear programming, are presented. Applications to time series prediction and parameter estimation of nonidentifiable dynamic systems are shown. State estimation is formalized in the context of the general theory. Assuming an exponential smoothing of the bounds of the noise it is proved that, for stable systems, the uncertainty of the state is aymptotically bounded. Then the results of the previous sections provide computable algorithms for this problem. Two application examples are shown: Leontief models and Markov chains

Columbia University Academic Commons