89,213 research outputs found
Global Optimization for Value Function Approximation
Existing value function approximation methods have been successfully used in
many applications, but they often lack useful a priori error bounds. We propose
a new approximate bilinear programming formulation of value function
approximation, which employs global optimization. The formulation provides
strong a priori guarantees on both robust and expected policy loss by
minimizing specific norms of the Bellman residual. Solving a bilinear program
optimally is NP-hard, but this is unavoidable because the Bellman-residual
minimization itself is NP-hard. We describe and analyze both optimal and
approximate algorithms for solving bilinear programs. The analysis shows that
this algorithm offers a convergent generalization of approximate policy
iteration. We also briefly analyze the behavior of bilinear programming
algorithms under incomplete samples. Finally, we demonstrate that the proposed
approach can consistently minimize the Bellman residual on simple benchmark
problems
Recommended from our members
Robust Estimation and Filtering in the Presence of Unknown but Bounded Noise
In this paper optimal algorithms for robust estimation and filtering are constructed.
No statistical assumption is supposed available or used and the noise is considered a deterministic variable unknown but bounded belonging to a set described by a norm. Previous results obtained for complete (one-to-one) and approximate information [1] are now extended to partial and approximate information. This information seems useful in dealing with dynamic systems not completely identifiable and/or with two different sources of noise, for example process and measurement noise. For different norms characterizing the noise, optimal algorithms (in a min-max sense) are shown. In particular for Hilbert norms a linear optimal algorithm is the well-known minimum variance estimator. For 1β β and 1β norms optimal algorithms, computable by linear programming, are presented. Applications to time series prediction and parameter estimation of nonidentifiable dynamic systems are shown. State estimation is formalized in the context of the general theory. Assuming an exponential smoothing of the bounds of the noise it is proved that, for stable systems, the uncertainty of the state is aymptotically bounded. Then the results of the previous sections provide computable algorithms for this problem. Two application examples are shown: Leontief models and Markov chains
- β¦