A Policy Optimization Method Towards Optimal-time Stability

Cao, Yuxue; Gao, Yang; Lan, Fengbo; Oseni, Oluwatosin; Wang, Shengjie; Xu, Haotian; Zhang, Tao; Zheng, Xiang

A Policy Optimization Method Towards Optimal-time Stability

Authors: Yuxue Cao
Yang Gao
Fengbo Lan
Oluwatosin Oseni
Shengjie Wang
Haotian Xu
Tao Zhang
Xiang Zheng
Publication date: 12 October 2023
Publisher

Abstract

In current model-free reinforcement learning (RL) algorithms, stability criteria based on sampling methods are commonly utilized to guide policy optimization. However, these criteria only guarantee the infinite-time convergence of the system's state to an equilibrium point, which leads to sub-optimality of the policy. In this paper, we propose a policy optimization technique incorporating sampling-based Lyapunov stability. Our approach enables the system's state to reach an equilibrium point within an optimal time and maintain stability thereafter, referred to as "optimal-time stability". To achieve this, we integrate the optimization method into the Actor-Critic framework, resulting in the development of the Adaptive Lyapunov-based Actor-Critic (ALAC) algorithm. Through evaluations conducted on ten robotic tasks, our approach outperforms previous studies significantly, effectively guiding the system to generate stable patterns.Comment: 27 pages, 11 figues. 7th Annual Conference on Robot Learning. 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2301.00521

Last time updated on 26/01/2023