Reinforcement Learning With Temporal Logic Rewards

Belta, Calin; Li, Xiao; Vasile, Cristian-Ioan

research

Reinforcement Learning With Temporal Logic Rewards

Authors: Calin Belta
Xiao Li
Cristian-Ioan Vasile
Publication date: 1 January 2017
Publisher
Doi

Abstract

Reinforcement learning (RL) depends critically on the choice of reward functions used to capture the de- sired behavior and constraints of a robot. Usually, these are handcrafted by a expert designer and represent heuristics for relatively simple tasks. Real world applications typically involve more complex tasks with rich temporal and logical structure. In this paper we take advantage of the expressive power of temporal logic (TL) to specify complex rules the robot should follow, and incorporate domain knowledge into learning. We propose Truncated Linear Temporal Logic (TLTL) as specifications language, that is arguably well suited for the robotics applications, together with quantitative semantics, i.e., robustness degree. We propose a RL approach to learn tasks expressed as TLTL formulae that uses their associated robustness degree as reward functions, instead of the manually crafted heuristics trying to capture the same specifications. We show in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied. Furthermore, we demonstrate the proposed RL approach in a toast-placing task learned by a Baxter robot

Similar works

Full text

Available Versions

Crossref

info:doi/10.1109%2Firos.2017.8...

Last time updated on 17/02/2019

Boston University Institutional Repository (OpenBU)

oai:open.bu.edu:2144/29608

Last time updated on 09/07/2019