23 research outputs found

    Safe Model-Free Reinforcement Learning using Disturbance-Observer-Based Control Barrier Functions

    Full text link
    Safe reinforcement learning (RL) with assured satisfaction of hard state constraints during training has recently received a lot of attention. Safety filters, e.g., based on control barrier functions (CBFs), provide a promising way for safe RL via modifying the unsafe actions of an RL agent on the fly. Existing safety filter-based approaches typically involve learning of uncertain dynamics and quantifying the learned model error, which leads to conservative filters before a large amount of data is collected to learn a good model, thereby preventing efficient exploration. This paper presents a method for safe and efficient model-free RL using disturbance observers (DOBs) and control barrier functions (CBFs). Unlike most existing safe RL methods that deal with hard state constraints, our method does not involve model learning, and leverages DOBs to accurately estimate the pointwise value of the uncertainty, which is then incorporated into a robust CBF condition to generate safe actions. The DOB-based CBF can be used as a safety filter with any model-free RL algorithms by minimally modifying the actions of an RL agent whenever necessary to ensure safety throughout the learning process. Simulation results on a unicycle and a 2D quadrotor demonstrate that the proposed method outperforms a state-of-the-art safe RL algorithm using CBFs and Gaussian processes-based model learning, in terms of safety violation rate, and sample and computational efficiency

    End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

    Get PDF
    Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) on-line learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous car-following with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.Comment: Published in AAAI 201

    Failing with Grace: Learning Neural Network Controllers that are Boundedly Unsafe

    Full text link
    In this work, we consider the problem of learning a feed-forward neural network (NN) controller to safely steer an arbitrarily shaped planar robot in a compact and obstacle-occluded workspace. Unlike existing methods that depend strongly on the density of data points close to the boundary of the safe state space to train NN controllers with closed-loop safety guarantees, we propose an approach that lifts such assumptions on the data that are hard to satisfy in practice and instead allows for graceful safety violations, i.e., of a bounded magnitude that can be spatially controlled. To do so, we employ reachability analysis methods to encapsulate safety constraints in the training process. Specifically, to obtain a computationally efficient over-approximation of the forward reachable set of the closed-loop system, we partition the robot's state space into cells and adaptively subdivide the cells that contain states which may escape the safe set under the trained control law. To do so, we first design appropriate under- and over-approximations of the robot's footprint to adaptively subdivide the configuration space into cells. Then, using the overlap between each cell's forward reachable set and the set of infeasible robot configurations as a measure for safety violations, we introduce penalty terms into the loss function that penalize this overlap in the training process. As a result, our method can learn a safe vector field for the closed-loop system and, at the same time, provide numerical worst-case bounds on safety violation over the whole configuration space, defined by the overlap between the over-approximation of the forward reachable set of the closed-loop system and the set of unsafe states. Moreover, it can control the tradeoff between computational complexity and tightness of these bounds. Finally, we provide a simulation study that verifies the efficacy of the proposed scheme

    A Control Barrier Perspective on Episodic Learning via Projection-to-State Safety

    Get PDF
    In this paper we seek to quantify the ability of learning to improve safety guarantees endowed by Control Barrier Functions (CBFs). In particular, we investigate how model uncertainty in the time derivative of a CBF can be reduced via learning, and how this leads to stronger statements on the safe behavior of a system. To this end, we build upon the idea of Input-to-State Safety (ISSf) to define Projection-to-State Safety (PSSf), which characterizes degradation in safety in terms of a projected disturbance. This enables the direct quantification of both how learning can improve safety guarantees, and how bounds on learning error translate to bounds on degradation in safety. We demonstrate that a practical episodic learning approach can use PSSf to reduce uncertainty and improve safety guarantees in simulation and experimentally.Comment: 6 pages, 2 figures, submitted to L-CSS + CDC 202

    A Control Barrier Perspective on Episodic Learning via Projection-to-State Safety

    Get PDF
    In this letter we seek to quantify the ability of learning to improve safety guarantees endowed by Control Barrier Functions (CBFs). In particular, we investigate how model uncertainty in the time derivative of a CBF can be reduced via learning, and how this leads to stronger statements on the safe behavior of a system. To this end, we build upon the idea of Input-to-State Safety (ISSf) to define Projection-to-State Safety (PSSf), which characterizes degradation in safety in terms of a projected disturbance. This enables the direct quantification of both how learning can improve safety guarantees, and how bounds on learning error translate to bounds on degradation in safety. We demonstrate that a practical episodic learning approach can use PSSf to reduce uncertainty and improve safety guarantees in simulation and experimentally
    corecore