Search CORE

52,489 research outputs found

An Efficient Policy Iteration Algorithm for Dynamic Programming Equations

Author: Alla Alessandro
Falcone Maurizio
Kalise Dante
Publication venue
Publication date: 01/01/2013
Field of study

We present an accelerated algorithm for the solution of static Hamilton-Jacobi-Bellman equations related to optimal control problems. Our scheme is based on a classic policy iteration procedure, which is known to have superlinear convergence in many relevant cases provided the initial guess is sufficiently close to the solution. In many cases, this limitation degenerates into a behavior similar to a value iteration method, with an increased computation time. The new scheme circumvents this problem by combining the advantages of both algorithms with an efficient coupling. The method starts with a value iteration phase and then switches to a policy iteration procedure when a certain error threshold is reached. A delicate point is to determine this threshold in order to avoid cumbersome computation with the value iteration and, at the same time, to be reasonably sure that the policy iteration method will finally converge to the optimal solution. We analyze the methods and efficient coupling in a number of examples in dimension two, three and four illustrating its properties

arXiv.org e-Print Archive

CiteSeerX

Repository@Nottingham

HAL Descartes

Archivio della ricerca- Università di Roma La Sapienza

Hal-Diderot

An efficient policy iteration algorithm for dynamic programming equations

Author: Alla A.
Falcone M.
Kalise D.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2015
Field of study

We present an accelerated algorithm for the solution of static Hamilton–Jacobi–Bellman equations related to optimal control problems. Our scheme is based on a classic policy iteration procedure, which is known to have superlinear convergence in many relevant cases provided the initial guess is sufficiently close to the solution. This limitation often degenerates into a behavior similar to a value iteration method, with an increased computation time. The new scheme circumvents this problem by combining the advantages of both algorithms with an efficient coupling. The method starts with a coarse-mesh value iteration phase and then switches to a fine-mesh policy iteration procedure when a certain error threshold is reached. A delicate point is to determine this threshold in order to avoid cumbersome computations with the value iteration and at the same time to ensure the convergence of the policy iteration method to the optimal solution. We analyze the methods and efficient coupling in a number of examples in different dimensions, illustrating their properties

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Adaptive Channel Recommendation For Opportunistic Spectrum Access

Author: Chen Xu
Huang Jianwei
Li Husheng
Publication venue
Publication date: 13/07/2011
Field of study

We propose a dynamic spectrum access scheme where secondary users recommend "good" channels to each other and access accordingly. We formulate the problem as an average reward based Markov decision process. We show the existence of the optimal stationary spectrum access policy, and explore its structure properties in two asymptotic cases. Since the action space of the Markov decision process is continuous, it is difficult to find the optimal policy by simply discretizing the action space and use the policy iteration, value iteration, or Q-learning methods. Instead, we propose a new algorithm based on the Model Reference Adaptive Search method, and prove its convergence to the optimal policy. Numerical results show that the proposed algorithms achieve up to 18% and 100% performance improvement than the static channel recommendation scheme in homogeneous and heterogeneous channel environments, respectively, and is more robust to channel dynamics

arXiv.org e-Print Archive

CiteSeerX

Dynamic Programming for Positive Linear Systems with Linear Costs

Author: Li Yuchao
Publication venue
Publication date: 03/06/2023
Field of study

Recent work by Rantzer [Ran22] formulated a class of optimal control problems involving positive linear systems, linear stage costs, and linear constraints. It was shown that the associated Bellman's equation can be characterized by a finite-dimensional nonlinear equation, which is solved by linear programming. In this work, we report complementary theories for the same class of problems. In particular, we provide conditions under which the solution is unique, investigate properties of the optimal policy, study the convergence of value iteration, policy iteration, and optimistic policy iteration applied to such problems, and analyze the boundedness of the solution to the associated linear program. Apart from a form of the Frobenius-Perron theorem, the majority of our results are built upon generic dynamic programming theory applicable to problems involving nonnegative stage costs

arXiv.org e-Print Archive

Nested Pseudo-likelihood Estimation and Bootstrap-based Inference for Structural Discrete Markov Decision Models

Author: Hiroyuki Kasahara
Katsumi Shimotsu
Publication venue
Publication date
Field of study

This paper analyzes the higher-order properties of nested pseudo-likelihood (NPL) estimators and their practical implementation for parametric discrete Markov decision models in which the probability distribution is defined as a fixed point. We propose a new NPL estimator that can achieve quadratic convergence without fully solving the fixed point problem in every iteration. We then extend the NPL estimators to develop one-step NPL bootstrap procedures for discrete Markov decision models and provide some Monte Carlo evidence based on a machine replacement model of Rust (1987). The proposed one-step bootstrap test statistics and confidence intervals improve upon the first order asymptotics even with a relatively small number of iterations. Improvements are particularly noticeable when analyzing the dynamic impacts of counterfactual policies.Edgeworth expansion, k-step bootstrap, maximum pseudo-likelihood estimators, nested fixed point algorithm, Newton-Raphson method, policy iteration

Research Papers in Economics

A note on the policy iteration algorithm for discounted Markov decision processes for a class of semicontinuous models

Author: Luque-Vásquez Fernando
Vega-Amaya Óscar
Publication venue
Publication date: 13/07/2023
Field of study

The standard version of the policy iteration (PI) algorithm fails for semicontinuous models, that is, for models with lower semicontinuous one-step costs and weakly continuous transition law. This is due to the lack of continuity properties of the discounted cost for stationary policies, thus appearing a measurability problem in the improvement step. The present work proposes an alternative version of PI algorithm which performs an smoothing step to avoid the measurability problem. Assuming that the model satisfies a Lyapunov growth conditions and also some standard continuity-compactness properties, it is shown the linear convergence of the policy iteration functions to the optimal value function. Strengthening the continuity conditions, in a second result, it is shown that among the improvement policies there is one with the best possible improvement and whose cost function is continuous.Comment: Fourteen pages page

arXiv.org e-Print Archive

Deep Reinforcement Learning for Approximate Policy Iteration: Convergence Analysis and a Post-Earthquake Disaster Response Case Study

Author: Gosavi Abhijit
Sneed L. (Lesley) H.
Spearing L. A.
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2023
Field of study

Approximate Policy Iteration (API) is a Class of Reinforcement Learning (RL) Algorithms that Seek to Solve the Long-Run Discounted Reward Markov Decision Process (MDP), Via the Policy Iteration Paradigm, Without Learning the Transition Model in the Underlying Bellman Equation. Unfortunately, These Algorithms Suffer from a Defect Known as Chattering in Which the Solution (Policy) Delivered in Each Iteration of the Algorithm Oscillates between Improved and Worsened Policies, Leading to Sub-Optimal Behavior. Two Causes for This that Have Been Traced to the Crucial Policy Improvement Step Are: (I) the Inaccuracies in the Policy Improvement Function and (Ii) the Exploration/exploitation Tradeoff Integral to This Step, Which Generates Variability in Performance. Both of These Defects Are Amplified by Simulation Noise. Deep RL Belongs to a Newer Class of Algorithms in Which the Resolution of the Learning Process is Refined Via Mechanisms Such as Experience Replay And/or Deep Neural Networks for Improved Performance. in This Paper, a New Deep Learning Approach is Developed for API Which Employs a More Accurate Policy Improvement Function, Via an Enhanced Resolution Bellman Equation, Thereby Reducing Chattering and Eliminating the Need for Exploration in the Policy Improvement Step. Versions of the New Algorithm for Both the Long-Run Discounted MDP and Semi-MDP Are Presented. Convergence Properties of the New Algorithm Are Studied Mathematically, and a Post-Earthquake Disaster Response Case Study is Employed to Demonstrate Numerically the Algorithm\u27s Efficacy

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine