2,149 research outputs found

    Robust satisfaction of temporal logic specifications via reinforcement learning

    Full text link
    We consider the problem of steering a system with unknown, stochastic dynamics to satisfy a rich, temporally-layered task given as a signal temporal logic formula. We represent the system as a finite-memory Markov decision process with unknown transition probabilities and whose states are built from a partition of the state space. We present provably convergent reinforcement learning algorithms to maximize the probability of satisfying a given specification and to maximize the average expected robustness, i.e. a measure of how strongly the formula is satisfied. Robustness allows us to quantify progress towards satisfying a given specification. We demonstrate via a pair of robot navigation simulation case studies that, due to the quantification of progress towards satisfaction, reinforcement learning with robustness maximization performs better than probability maximization in terms of both probability of satisfaction and expected robustness with a low number of training examples

    Robust Satisfaction of Temporal Logic Specifications via Reinforcement Learning

    Full text link
    We consider the problem of steering a system with unknown, stochastic dynamics to satisfy a rich, temporally layered task given as a signal temporal logic formula. We represent the system as a Markov decision process in which the states are built from a partition of the state space and the transition probabilities are unknown. We present provably convergent reinforcement learning algorithms to maximize the probability of satisfying a given formula and to maximize the average expected robustness, i.e., a measure of how strongly the formula is satisfied. We demonstrate via a pair of robot navigation simulation case studies that reinforcement learning with robustness maximization performs better than probability maximization in terms of both probability of satisfaction and expected robustness.Comment: 8 pages, 4 figure

    Comparative evaluation of approaches in T.4.1-4.3 and working definition of adaptive module

    Get PDF
    The goal of this deliverable is two-fold: (1) to present and compare different approaches towards learning and encoding movements us- ing dynamical systems that have been developed by the AMARSi partners (in the past during the first 6 months of the project), and (2) to analyze their suitability to be used as adaptive modules, i.e. as building blocks for the complete architecture that will be devel- oped in the project. The document presents a total of eight approaches, in two groups: modules for discrete movements (i.e. with a clear goal where the movement stops) and for rhythmic movements (i.e. which exhibit periodicity). The basic formulation of each approach is presented together with some illustrative simulation results. Key character- istics such as the type of dynamical behavior, learning algorithm, generalization properties, stability analysis are then discussed for each approach. We then make a comparative analysis of the different approaches by comparing these characteristics and discussing their suitability for the AMARSi project

    Asymptotically Optimal Sampling-Based Motion Planning Methods

    Full text link
    Motion planning is a fundamental problem in autonomous robotics that requires finding a path to a specified goal that avoids obstacles and takes into account a robot's limitations and constraints. It is often desirable for this path to also optimize a cost function, such as path length. Formal path-quality guarantees for continuously valued search spaces are an active area of research interest. Recent results have proven that some sampling-based planning methods probabilistically converge toward the optimal solution as computational effort approaches infinity. This survey summarizes the assumptions behind these popular asymptotically optimal techniques and provides an introduction to the significant ongoing research on this topic.Comment: Posted with permission from the Annual Review of Control, Robotics, and Autonomous Systems, Volume 4. Copyright 2021 by Annual Reviews, https://www.annualreviews.org/. 25 pages. 2 figure

    Active Sensing as Bayes-Optimal Sequential Decision Making

    Full text link
    Sensory inference under conditions of uncertainty is a major problem in both machine learning and computational neuroscience. An important but poorly understood aspect of sensory processing is the role of active sensing. Here, we present a Bayes-optimal inference and control framework for active sensing, C-DAC (Context-Dependent Active Controller). Unlike previously proposed algorithms that optimize abstract statistical objectives such as information maximization (Infomax) [Butko & Movellan, 2010] or one-step look-ahead accuracy [Najemnik & Geisler, 2005], our active sensing model directly minimizes a combination of behavioral costs, such as temporal delay, response error, and effort. We simulate these algorithms on a simple visual search task to illustrate scenarios in which context-sensitivity is particularly beneficial and optimization with respect to generic statistical objectives particularly inadequate. Motivated by the geometric properties of the C-DAC policy, we present both parametric and non-parametric approximations, which retain context-sensitivity while significantly reducing computational complexity. These approximations enable us to investigate the more complex problem involving peripheral vision, and we notice that the difference between C-DAC and statistical policies becomes even more evident in this scenario.Comment: Scheduled to appear in UAI 201

    Reinforcement Learning and Planning for Preference Balancing Tasks

    Get PDF
    Robots are often highly non-linear dynamical systems with many degrees of freedom, making solving motion problems computationally challenging. One solution has been reinforcement learning (RL), which learns through experimentation to automatically perform the near-optimal motions that complete a task. However, high-dimensional problems and task formulation often prove challenging for RL. We address these problems with PrEference Appraisal Reinforcement Learning (PEARL), which solves Preference Balancing Tasks (PBTs). PBTs define a problem as a set of preferences that the system must balance to achieve a goal. The method is appropriate for acceleration-controlled systems with continuous state-space and either discrete or continuous action spaces with unknown system dynamics. We show that PEARL learns a sub-optimal policy on a subset of states and actions, and transfers the policy to the expanded domain to produce a more refined plan on a class of robotic problems. We establish convergence to task goal conditions, and even when preconditions are not verifiable, show that this is a valuable method to use before other more expensive approaches. Evaluation is done on several robotic problems, such as Aerial Cargo Delivery, Multi-Agent Pursuit, Rendezvous, and Inverted Flying Pendulum both in simulation and experimentally. Additionally, PEARL is leveraged outside of robotics as an array sorting agent. The results demonstrate high accuracy and fast learning times on a large set of practical applications
    • …
    corecore