23 research outputs found
Safe Model-Free Reinforcement Learning using Disturbance-Observer-Based Control Barrier Functions
Safe reinforcement learning (RL) with assured satisfaction of hard state
constraints during training has recently received a lot of attention. Safety
filters, e.g., based on control barrier functions (CBFs), provide a promising
way for safe RL via modifying the unsafe actions of an RL agent on the fly.
Existing safety filter-based approaches typically involve learning of uncertain
dynamics and quantifying the learned model error, which leads to conservative
filters before a large amount of data is collected to learn a good model,
thereby preventing efficient exploration. This paper presents a method for safe
and efficient model-free RL using disturbance observers (DOBs) and control
barrier functions (CBFs). Unlike most existing safe RL methods that deal with
hard state constraints, our method does not involve model learning, and
leverages DOBs to accurately estimate the pointwise value of the uncertainty,
which is then incorporated into a robust CBF condition to generate safe
actions. The DOB-based CBF can be used as a safety filter with any model-free
RL algorithms by minimally modifying the actions of an RL agent whenever
necessary to ensure safety throughout the learning process. Simulation results
on a unicycle and a 2D quadrotor demonstrate that the proposed method
outperforms a state-of-the-art safe RL algorithm using CBFs and Gaussian
processes-based model learning, in terms of safety violation rate, and sample
and computational efficiency
End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks
Reinforcement Learning (RL) algorithms have found limited success beyond
simulated applications, and one main reason is the absence of safety guarantees
during the learning process. Real world systems would realistically fail or
break before an optimal controller can be learned. To address this issue, we
propose a controller architecture that combines (1) a model-free RL-based
controller with (2) model-based controllers utilizing control barrier functions
(CBFs) and (3) on-line learning of the unknown system dynamics, in order to
ensure safety during learning. Our general framework leverages the success of
RL algorithms to learn high-performance controllers, while the CBF-based
controllers both guarantee safety and guide the learning process by
constraining the set of explorable polices. We utilize Gaussian Processes (GPs)
to model the system dynamics and its uncertainties.
Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high
probability during the learning process, regardless of the RL algorithm used,
and demonstrates greater policy exploration efficiency. We test our algorithm
on (1) control of an inverted pendulum and (2) autonomous car-following with
wireless vehicle-to-vehicle communication, and show that our algorithm attains
much greater sample efficiency in learning than other state-of-the-art
algorithms and maintains safety during the entire learning process.Comment: Published in AAAI 201
Failing with Grace: Learning Neural Network Controllers that are Boundedly Unsafe
In this work, we consider the problem of learning a feed-forward neural
network (NN) controller to safely steer an arbitrarily shaped planar robot in a
compact and obstacle-occluded workspace. Unlike existing methods that depend
strongly on the density of data points close to the boundary of the safe state
space to train NN controllers with closed-loop safety guarantees, we propose an
approach that lifts such assumptions on the data that are hard to satisfy in
practice and instead allows for graceful safety violations, i.e., of a bounded
magnitude that can be spatially controlled. To do so, we employ reachability
analysis methods to encapsulate safety constraints in the training process.
Specifically, to obtain a computationally efficient over-approximation of the
forward reachable set of the closed-loop system, we partition the robot's state
space into cells and adaptively subdivide the cells that contain states which
may escape the safe set under the trained control law. To do so, we first
design appropriate under- and over-approximations of the robot's footprint to
adaptively subdivide the configuration space into cells. Then, using the
overlap between each cell's forward reachable set and the set of infeasible
robot configurations as a measure for safety violations, we introduce penalty
terms into the loss function that penalize this overlap in the training
process. As a result, our method can learn a safe vector field for the
closed-loop system and, at the same time, provide numerical worst-case bounds
on safety violation over the whole configuration space, defined by the overlap
between the over-approximation of the forward reachable set of the closed-loop
system and the set of unsafe states. Moreover, it can control the tradeoff
between computational complexity and tightness of these bounds. Finally, we
provide a simulation study that verifies the efficacy of the proposed scheme
A Control Barrier Perspective on Episodic Learning via Projection-to-State Safety
In this paper we seek to quantify the ability of learning to improve safety
guarantees endowed by Control Barrier Functions (CBFs). In particular, we
investigate how model uncertainty in the time derivative of a CBF can be
reduced via learning, and how this leads to stronger statements on the safe
behavior of a system. To this end, we build upon the idea of Input-to-State
Safety (ISSf) to define Projection-to-State Safety (PSSf), which characterizes
degradation in safety in terms of a projected disturbance. This enables the
direct quantification of both how learning can improve safety guarantees, and
how bounds on learning error translate to bounds on degradation in safety. We
demonstrate that a practical episodic learning approach can use PSSf to reduce
uncertainty and improve safety guarantees in simulation and experimentally.Comment: 6 pages, 2 figures, submitted to L-CSS + CDC 202
A Control Barrier Perspective on Episodic Learning via Projection-to-State Safety
In this letter we seek to quantify the ability of learning to improve safety guarantees endowed by Control Barrier Functions (CBFs). In particular, we investigate how model uncertainty in the time derivative of a CBF can be reduced via learning, and how this leads to stronger statements on the safe behavior of a system. To this end, we build upon the idea of Input-to-State Safety (ISSf) to define Projection-to-State Safety (PSSf), which characterizes degradation in safety in terms of a projected disturbance. This enables the direct quantification of both how learning can improve safety guarantees, and how bounds on learning error translate to bounds on degradation in safety. We demonstrate that a practical episodic learning approach can use PSSf to reduce uncertainty and improve safety guarantees in simulation and experimentally