Search CORE

16,199 research outputs found

Least Squares Temporal Difference Actor-Critic Methods with Applications to Robot Motion Control

Author: Belta Calin A.
Ding Xu Chu
Estanjini Reza Moazzez
Lahijanian Morteza
Paschalidis Ioannis Ch.
Wang Jing
Publication venue
Publication date: 01/01/2011
Field of study

We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize the probability of reaching some states while avoiding some other states. This problem is motivated by applications in robotics, where such problems naturally arise when probabilistic models of robot motion are required to satisfy temporal logic task specifications. We transform this problem into a Stochastic Shortest Path (SSP) problem and develop a new approximate dynamic programming algorithm to solve it. This algorithm is of the actor-critic type and uses a least-square temporal difference learning method. It operates on sample paths of the system and optimizes the policy within a pre-specified class parameterized by a parsimonious set of parameters. We show its convergence to a policy corresponding to a stationary point in the parameters' space. Simulation results confirm the effectiveness of the proposed solution.Comment: Technical report accompanying an accepted paper to CDC 201

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Robust Satisfaction of Temporal Logic Specifications via Reinforcement Learning

Author: Aksaray Derya
Belta Calin
Jones Austin
Kong Zhaodan
Schwager Mac
Publication venue
Publication date: 01/01/2015
Field of study

We consider the problem of steering a system with unknown, stochastic dynamics to satisfy a rich, temporally layered task given as a signal temporal logic formula. We represent the system as a Markov decision process in which the states are built from a partition of the state space and the transition probabilities are unknown. We present provably convergent reinforcement learning algorithms to maximize the probability of satisfying a given formula and to maximize the average expected robustness, i.e., a measure of how strongly the formula is satisfied. We demonstrate via a pair of robot navigation simulation case studies that reinforcement learning with robustness maximization performs better than probability maximization in terms of both probability of satisfaction and expected robustness.Comment: 8 pages, 4 figure

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Robust satisfaction of temporal logic specifications via reinforcement learning

Author: Aksaray Derya
Belta Calin
Jones Austin
Kong Zhaodan
Schwager Mac
Publication venue
Publication date: 01/01/2015
Field of study

We consider the problem of steering a system with unknown, stochastic dynamics to satisfy a rich, temporally-layered task given as a signal temporal logic formula. We represent the system as a finite-memory Markov decision process with unknown transition probabilities and whose states are built from a partition of the state space. We present provably convergent reinforcement learning algorithms to maximize the probability of satisfying a given specification and to maximize the average expected robustness, i.e. a measure of how strongly the formula is satisfied. Robustness allows us to quantify progress towards satisfying a given specification. We demonstrate via a pair of robot navigation simulation case studies that, due to the quantification of progress towards satisfaction, reinforcement learning with robustness maximization performs better than probability maximization in terms of both probability of satisfaction and expected robustness with a low number of training examples

Boston University Institutional Repository (OpenBU)