Convergence of Policy Improvement for Entropy-Regularized Stochastic
  Control Problems

Huang, Yu-Jui; Wang, Zhenhua; Zhou, Zhou

Convergence of Policy Improvement for Entropy-Regularized Stochastic Control Problems

Authors: Yu-Jui Huang
Zhenhua Wang
Zhou Zhou
Publication date: 3 July 2023
Publisher

Abstract

For a general entropy-regularized stochastic control problem on an infinite horizon, we prove that a policy improvement algorithm (PIA) converges to an optimal relaxed control. Contrary to the standard stochastic control literature, classical H\"{o}lder estimates of value functions do not ensure the convergence of the PIA, due to the added entropy-regularizing term. To circumvent this, we carry out a delicate estimation by moving back and forth between appropriate H\"{o}lder and Sobolev spaces. This requires new Sobolev estimates designed specifically for the purpose of policy improvement and a nontrivial technique to contain the entropy growth. Ultimately, we obtain a uniform H\"{o}lder bound for the sequence of value functions generated by the PIA, thereby achieving the desired convergence result. Characterization of the optimal value function as the unique solution to an exploratory Hamilton-Jacobi-Bellman equation comes as a by-product

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2209.07059

Last time updated on 30/11/2022