8,757 research outputs found

    Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control

    Get PDF
    We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize the probability of reaching some states while avoiding some other states. This problem is motivated by applications in robotics, where such problems naturally arise when probabilistic models of robot motion are required to satisfy temporal logic task specifications. We transform this problem into a Stochastic Shortest Path (SSP) problem and develop a new approximate dynamic programming algorithm to solve it. This algorithm is of the actor-critic type and uses a least-square temporal difference learning method. It operates on sample paths of the system and optimizes the policy within a pre-specified class parameterized by a parsimonious set of parameters. We show its convergence to a policy corresponding to a stationary point in the parameters' space. Simulation results confirm the effectiveness of the proposed solution.Comment: Technical report accompanying an accepted paper to CDC 201

    Deep reinforcement learning under signal temporal logic constraints using Lagrangian relaxation

    Full text link
    Deep reinforcement learning (DRL) has attracted much attention as an approach to solve optimal control problems without mathematical models of systems. On the other hand, in general, constraints may be imposed on optimal control problems. In this study, we consider the optimal control problems with constraints to complete temporal control tasks. We describe the constraints using signal temporal logic (STL), which is useful for time sensitive control tasks since it can specify continuous signals within bounded time intervals. To deal with the STL constraints, we introduce an extended constrained Markov decision process (CMDP), which is called a Ï„\tau-CMDP. We formulate the STL-constrained optimal control problem as the Ï„\tau-CMDP and propose a two-phase constrained DRL algorithm using the Lagrangian relaxation method. Through simulations, we also demonstrate the learning performance of the proposed algorithm.Comment: 16 pages, 20 figures, accepted for IEEE Acces
    • …
    corecore