Search CORE

8,757 research outputs found

Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control

Author: Belta Calin A.
Ding Xu Chu
Estanjini Reza Moazzez
Lahijanian Morteza
Paschalidis Ioannis Ch.
Wang Jing
Publication venue
Publication date: 01/01/2011
Field of study

We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize the probability of reaching some states while avoiding some other states. This problem is motivated by applications in robotics, where such problems naturally arise when probabilistic models of robot motion are required to satisfy temporal logic task specifications. We transform this problem into a Stochastic Shortest Path (SSP) problem and develop a new approximate dynamic programming algorithm to solve it. This algorithm is of the actor-critic type and uses a least-square temporal difference learning method. It operates on sample paths of the system and optimizes the policy within a pre-specified class parameterized by a parsimonious set of parameters. We show its convergence to a policy corresponding to a stationary point in the parameters' space. Simulation results confirm the effectiveness of the proposed solution.Comment: Technical report accompanying an accepted paper to CDC 201

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Deep reinforcement learning under signal temporal logic constraints using Lagrangian relaxation

Author: Ikemoto Junya
Ushio Toshimitsu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/11/2022
Field of study

Deep reinforcement learning (DRL) has attracted much attention as an approach to solve optimal control problems without mathematical models of systems. On the other hand, in general, constraints may be imposed on optimal control problems. In this study, we consider the optimal control problems with constraints to complete temporal control tasks. We describe the constraints using signal temporal logic (STL), which is useful for time sensitive control tasks since it can specify continuous signals within bounded time intervals. To deal with the STL constraints, we introduce an extended constrained Markov decision process (CMDP), which is called a

\tau

-CMDP. We formulate the STL-constrained optimal control problem as the

\tau

-CMDP and propose a two-phase constrained DRL algorithm using the Lagrangian relaxation method. Through simulations, we also demonstrate the learning performance of the proposed algorithm.Comment: 16 pages, 20 figures, accepted for IEEE Acces

arXiv.org e-Print Archive