In current model-free reinforcement learning (RL) algorithms, stability
criteria based on sampling methods are commonly utilized to guide policy
optimization. However, these criteria only guarantee the infinite-time
convergence of the system's state to an equilibrium point, which leads to
sub-optimality of the policy. In this paper, we propose a policy optimization
technique incorporating sampling-based Lyapunov stability. Our approach enables
the system's state to reach an equilibrium point within an optimal time and
maintain stability thereafter, referred to as "optimal-time stability". To
achieve this, we integrate the optimization method into the Actor-Critic
framework, resulting in the development of the Adaptive Lyapunov-based
Actor-Critic (ALAC) algorithm. Through evaluations conducted on ten robotic
tasks, our approach outperforms previous studies significantly, effectively
guiding the system to generate stable patterns.Comment: 27 pages, 11 figues. 7th Annual Conference on Robot Learning. 202