8,757 research outputs found
Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control
We consider the problem of finding a control policy for a Markov Decision
Process (MDP) to maximize the probability of reaching some states while
avoiding some other states. This problem is motivated by applications in
robotics, where such problems naturally arise when probabilistic models of
robot motion are required to satisfy temporal logic task specifications. We
transform this problem into a Stochastic Shortest Path (SSP) problem and
develop a new approximate dynamic programming algorithm to solve it. This
algorithm is of the actor-critic type and uses a least-square temporal
difference learning method. It operates on sample paths of the system and
optimizes the policy within a pre-specified class parameterized by a
parsimonious set of parameters. We show its convergence to a policy
corresponding to a stationary point in the parameters' space. Simulation
results confirm the effectiveness of the proposed solution.Comment: Technical report accompanying an accepted paper to CDC 201
Deep reinforcement learning under signal temporal logic constraints using Lagrangian relaxation
Deep reinforcement learning (DRL) has attracted much attention as an approach
to solve optimal control problems without mathematical models of systems. On
the other hand, in general, constraints may be imposed on optimal control
problems. In this study, we consider the optimal control problems with
constraints to complete temporal control tasks. We describe the constraints
using signal temporal logic (STL), which is useful for time sensitive control
tasks since it can specify continuous signals within bounded time intervals. To
deal with the STL constraints, we introduce an extended constrained Markov
decision process (CMDP), which is called a -CMDP. We formulate the
STL-constrained optimal control problem as the -CMDP and propose a
two-phase constrained DRL algorithm using the Lagrangian relaxation method.
Through simulations, we also demonstrate the learning performance of the
proposed algorithm.Comment: 16 pages, 20 figures, accepted for IEEE Acces
- …