17,911 research outputs found

    A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

    Full text link
    Reward engineering is an important aspect of reinforcement learning. Whether or not the user's intentions can be correctly encapsulated in the reward function can significantly impact the learning outcome. Current methods rely on manually crafted reward functions that often require parameter tuning to obtain the desired behavior. This operation can be expensive when exploration requires systems to interact with the physical world. In this paper, we explore the use of temporal logic (TL) to specify tasks in reinforcement learning. TL formula can be translated to a real-valued function that measures its level of satisfaction against a trajectory. We take advantage of this function and propose temporal logic policy search (TLPS), a model-free learning technique that finds a policy that satisfies the TL specification. A set of simulated experiments are conducted to evaluate the proposed approach

    Reinforcement Learning With Temporal Logic Rewards

    Full text link
    Reinforcement learning (RL) depends critically on the choice of reward functions used to capture the de- sired behavior and constraints of a robot. Usually, these are handcrafted by a expert designer and represent heuristics for relatively simple tasks. Real world applications typically involve more complex tasks with rich temporal and logical structure. In this paper we take advantage of the expressive power of temporal logic (TL) to specify complex rules the robot should follow, and incorporate domain knowledge into learning. We propose Truncated Linear Temporal Logic (TLTL) as specifications language, that is arguably well suited for the robotics applications, together with quantitative semantics, i.e., robustness degree. We propose a RL approach to learn tasks expressed as TLTL formulae that uses their associated robustness degree as reward functions, instead of the manually crafted heuristics trying to capture the same specifications. We show in simulated trials that learning is faster and policies obtained using the proposed approach outperform the ones learned using heuristic rewards in terms of the robustness degree, i.e., how well the tasks are satisfied. Furthermore, we demonstrate the proposed RL approach in a toast-placing task learned by a Baxter robot

    A Hierarchical Reinforcement Learning Method for Persistent Time-Sensitive Tasks

    Full text link
    Reinforcement learning has been applied to many interesting problems such as the famous TD-gammon and the inverted helicopter flight. However, little effort has been put into developing methods to learn policies for complex persistent tasks and tasks that are time-sensitive. In this paper, we take a step towards solving this problem by using signal temporal logic (STL) as task specification, and taking advantage of the temporal abstraction feature that the options framework provide. We show via simulation that a relatively easy to implement algorithm that combines STL and options can learn a satisfactory policy with a small number of training case

    A hierarchical reinforcement learning method for persistent time-sensitive tasks

    Full text link
    Reinforcement learning has been applied to many interesting problems such as the famous TD-gammon and the inverted helicopter flight. However, little effort has been put into developing methods to learn policies for complex persistent tasks and tasks that are time-sensitive. In this paper, we take a step towards solving this problem by using signal temporal logic (STL) as task specification, and taking advantage of the temporal abstraction feature that the options framework provide. We show via simulation that a relatively easy to implement algorithm that combines STL and options can learn a satisfactory policy with a small number of training cases

    Automata guided hierarchical reinforcement learning for zero-shot skill composition

    Full text link
    An obstacle that prevents the wide adoption of (deep) reinforcement learning (RL) in control systems is its need for a large amount of interactions with the environment in order to master a skill. The learned skill usually generalizes poorly across domains and re-training is often necessary when presented with a new task. We present a framework that combines methods in formal methods with hierarchical reinforcement learning (HRL). The set of techniques we provide allows for convenient specification of tasks with complex logic, learn hierarchical policies (meta-controller and low-level controllers) with well-defined intrinsic rewards using any RL methods and is able to construct new skills from existing ones without additional learning. We evaluate the proposed methods in a simple grid world simulation as well as simulation on a Baxter robot

    Prescribed Performance Control Guided Policy Improvement for Satisfying Signal Temporal Logic Tasks

    Full text link
    Signal temporal logic (STL) provides a user-friendly interface for defining complex tasks for robotic systems. Recent efforts aim at designing control laws or using reinforcement learning methods to find policies which guarantee satisfaction of these tasks. While the former suffer from the trade-off between task specification and computational complexity, the latter encounter difficulties in exploration as the tasks become more complex and challenging to satisfy. This paper proposes to combine the benefits of the two approaches and use an efficient prescribed performance control (PPC) base law to guide exploration within the reinforcement learning algorithm. The potential of the method is demonstrated in a simulated environment through two sample navigational tasks.Comment: This is the extended version of the paper accepted to the 2019 American Control Conference (ACC), Philadelphia (to be published

    Few-Shot Bayesian Imitation Learning with Logical Program Policies

    Full text link
    Humans can learn many novel tasks from a very small number (1--5) of demonstrations, in stark contrast to the data requirements of nearly tabula rasa deep learning methods. We propose an expressive class of policies, a strong but general prior, and a learning algorithm that, together, can learn interesting policies from very few examples. We represent policies as logical combinations of programs drawn from a domain-specific language (DSL), define a prior over policies with a probabilistic grammar, and derive an approximate Bayesian inference algorithm to learn policies from demonstrations. In experiments, we study five strategy games played on a 2D grid with one shared DSL. After a few demonstrations of each game, the inferred policies generalize to new game instances that differ substantially from the demonstrations. Our policy learning is 20--1,000x more data efficient than convolutional and fully convolutional policy learning and many orders of magnitude more computationally efficient than vanilla program induction. We argue that the proposed method is an apt choice for tasks that have scarce training data and feature significant, structured variation between task instances.Comment: AAAI 202
    • …
    corecore