857 research outputs found
Wasserstein Distributionally Robust Control Barrier Function using Conditional Value-at-Risk with Differentiable Convex Programming
Control Barrier functions (CBFs) have attracted extensive attention for
designing safe controllers for their deployment in real-world safety-critical
systems. However, the perception of the surrounding environment is often
subject to stochasticity and further distributional shift from the nominal one.
In this paper, we present distributional robust CBF (DR-CBF) to achieve
resilience under distributional shift while keeping the advantages of CBF, such
as computational efficacy and forward invariance.
To achieve this goal, we first propose a single-level convex reformulation to
estimate the conditional value at risk (CVaR) of the safety constraints under
distributional shift measured by a Wasserstein metric, which is by nature
tri-level programming. Moreover, to construct a control barrier condition to
enforce the forward invariance of the CVaR, the technique of differentiable
convex programming is applied to enable differentiation through the
optimization layer of CVaR estimation. We also provide an approximate variant
of DR-CBF for higher-order systems. Simulation results are presented to
validate the chance-constrained safety guarantee under the distributional shift
in both first and second-order systems
Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems
Learning-based control algorithms require data collection with abundant
supervision for training. Safe exploration algorithms ensure the safety of this
data collection process even when only partial knowledge is available. We
present a new approach for optimal motion planning with safe exploration that
integrates chance-constrained stochastic optimal control with dynamics learning
and feedback control. We derive an iterative convex optimization algorithm that
solves an \underline{Info}rmation-cost \underline{S}tochastic
\underline{N}onlinear \underline{O}ptimal \underline{C}ontrol problem
(Info-SNOC). The optimization objective encodes both optimal performance and
exploration for learning, and the safety is incorporated as distributionally
robust chance constraints. The dynamics are predicted from a robust regression
model that is learned from data. The Info-SNOC algorithm is used to compute a
sub-optimal pool of safe motion plans that aid in exploration for learning
unknown residual dynamics under safety constraints. A stable feedback
controller is used to execute the motion plan and collect data for model
learning. We prove the safety of rollout from our exploration method and
reduction in uncertainty over epochs, thereby guaranteeing the consistency of
our learning method. We validate the effectiveness of Info-SNOC by designing
and implementing a pool of safe trajectories for a planar robot. We demonstrate
that our approach has higher success rate in ensuring safety when compared to a
deterministic trajectory optimization approach.Comment: Submitted to RA-L 2020, review-
Learning Predictive Safety Filter via Decomposition of Robust Invariant Set
Ensuring safety of nonlinear systems under model uncertainty and external
disturbances is crucial, especially for real-world control tasks. Predictive
methods such as robust model predictive control (RMPC) require solving
nonconvex optimization problems online, which leads to high computational
burden and poor scalability. Reinforcement learning (RL) works well with
complex systems, but pays the price of losing rigorous safety guarantee. This
paper presents a theoretical framework that bridges the advantages of both RMPC
and RL to synthesize safety filters for nonlinear systems with state- and
action-dependent uncertainty. We decompose the robust invariant set (RIS) into
two parts: a target set that aligns with terminal region design of RMPC, and a
reach-avoid set that accounts for the rest of RIS. We propose a policy
iteration approach for robust reach-avoid problems and establish its monotone
convergence. This method sets the stage for an adversarial actor-critic deep RL
algorithm, which simultaneously synthesizes a reach-avoid policy network, a
disturbance policy network, and a reach-avoid value network. The learned
reach-avoid policy network is utilized to generate nominal trajectories for
online verification, which filters potentially unsafe actions that may drive
the system into unsafe regions when worst-case disturbances are applied. We
formulate a second-order cone programming (SOCP) approach for online
verification using system level synthesis, which optimizes for the worst-case
reach-avoid value of any possible trajectories. The proposed safety filter
requires much lower computational complexity than RMPC and still enjoys
persistent robust safety guarantee. The effectiveness of our method is
illustrated through a numerical example
- …