933 research outputs found
Safe Multi-Agent Interaction through Robust Control Barrier Functions with Learned Uncertainties
Robots operating in real world settings must navigate and maintain safety while interacting with many heterogeneous agents and obstacles. Multi-Agent Control Barrier Functions (CBF) have emerged as a computationally efficient tool to guarantee safety in multi-agent environments, but they assume perfect knowledge of both the robot dynamics and other agents' dynamics. While knowledge of the robot's dynamics might be reasonably well known, the heterogeneity of agents in real-world environments means there will always be considerable uncertainty in our prediction of other agents' dynamics. This work aims to learn high-confidence bounds for these dynamic uncertainties using Matrix-Variate Gaussian Process models, and incorporates them into a robust multi-agent CBF framework. We transform the resulting min-max robust CBF into a quadratic program, which can be efficiently solved in real time. We verify via simulation results that the nominal multi-agent CBF is often violated during agent interactions, whereas our robust formulation maintains safety with a much higher probability and adapts to learned uncertainties
Learning Control Policies for Stochastic Systems with Reach-avoid Guarantees
We study the problem of learning controllers for discrete-time non-linear
stochastic dynamical systems with formal reach-avoid guarantees. This work
presents the first method for providing formal reach-avoid guarantees, which
combine and generalize stability and safety guarantees, with a tolerable
probability threshold over the infinite time horizon. Our method
leverages advances in machine learning literature and it represents formal
certificates as neural networks. In particular, we learn a certificate in the
form of a reach-avoid supermartingale (RASM), a novel notion that we introduce
in this work. Our RASMs provide reachability and avoidance guarantees by
imposing constraints on what can be viewed as a stochastic extension of level
sets of Lyapunov functions for deterministic systems. Our approach solves
several important problems -- it can be used to learn a control policy from
scratch, to verify a reach-avoid specification for a fixed control policy, or
to fine-tune a pre-trained policy if it does not satisfy the reach-avoid
specification. We validate our approach on stochastic non-linear
reinforcement learning tasks.Comment: Accepted at AAAI 202
Model-based Dynamic Shielding for Safe and Efficient Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement Learning (MARL) discovers policies that maximize
reward but do not have safety guarantees during the learning and deployment
phases. Although shielding with Linear Temporal Logic (LTL) is a promising
formal method to ensure safety in single-agent Reinforcement Learning (RL), it
results in conservative behaviors when scaling to multi-agent scenarios.
Additionally, it poses computational challenges for synthesizing shields in
complex multi-agent environments. This work introduces Model-based Dynamic
Shielding (MBDS) to support MARL algorithm design. Our algorithm synthesizes
distributive shields, which are reactive systems running in parallel with each
MARL agent, to monitor and rectify unsafe behaviors. The shields can
dynamically split, merge, and recompute based on agents' states. This design
enables efficient synthesis of shields to monitor agents in complex
environments without coordination overheads. We also propose an algorithm to
synthesize shields without prior knowledge of the dynamics model. The proposed
algorithm obtains an approximate world model by interacting with the
environment during the early stage of exploration, making our MBDS enjoy formal
safety guarantees with high probability. We demonstrate in simulations that our
framework can surpass existing baselines in terms of safety guarantees and
learning performance.Comment: Accepted in AAMAS 202
Do androids dream of electric fences? Safety-aware reinforcement learning with latent shielding
The growing trend of fledgling reinforcement learning sys- tems making their way into real-world applications has been accompanied by growing concerns for their safety and ro- bustness. In recent years, a variety of approaches have been put forward to address the challenges of safety-aware rein- forcement learning; however, these methods often either re- quire a handcrafted model of the environment to be pro- vided beforehand, or that the environment is relatively simple and low-dimensional. We present a novel approach to safety- aware deep reinforcement learning in high-dimensional envi- ronments called latent shielding. Latent shielding leverages internal representations of the environment learnt by model- based agents to “imagine” future trajectories and avoid those deemed unsafe. We experimentally demonstrate that this approach leads to improved adherence to formally-defined safety specifications
Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments
It is quite challenging to ensure the safety of reinforcement learning (RL)
agents in an unknown and stochastic environment under hard constraints that
require the system state not to reach certain specified unsafe regions. Many
popular safe RL methods such as those based on the Constrained Markov Decision
Process (CMDP) paradigm formulate safety violations in a cost function and try
to constrain the expectation of cumulative cost under a threshold. However, it
is often difficult to effectively capture and enforce hard reachability-based
safety constraints indirectly with such constraints on safety violation costs.
In this work, we leverage the notion of barrier function to explicitly encode
the hard safety constraints, and given that the environment is unknown, relax
them to our design of \emph{generative-model-based soft barrier functions}.
Based on such soft barriers, we propose a safe RL approach that can jointly
learn the environment and optimize the control policy, while effectively
avoiding unsafe regions with safety probability optimization. Experiments on a
set of examples demonstrate that our approach can effectively enforce hard
safety constraints and significantly outperform CMDP-based baseline methods in
system safe rate measured via simulations.Comment: 13 pages, 7 figure
ISAACS: Iterative Soft Adversarial Actor-Critic for Safety
The deployment of robots in uncontrolled environments requires them to
operate robustly under previously unseen scenarios, like irregular terrain and
wind conditions. Unfortunately, while rigorous safety frameworks from robust
optimal control theory scale poorly to high-dimensional nonlinear dynamics,
control policies computed by more tractable "deep" methods lack guarantees and
tend to exhibit little robustness to uncertain operating conditions. This work
introduces a novel approach enabling scalable synthesis of robust
safety-preserving controllers for robotic systems with general nonlinear
dynamics subject to bounded modeling error by combining game-theoretic safety
analysis with adversarial reinforcement learning in simulation. Following a
soft actor-critic scheme, a safety-seeking fallback policy is co-trained with
an adversarial "disturbance" agent that aims to invoke the worst-case
realization of model error and training-to-deployment discrepancy allowed by
the designer's uncertainty. While the learned control policy does not
intrinsically guarantee safety, it is used to construct a real-time safety
filter (or shield) with robust safety guarantees based on forward reachability
rollouts. This shield can be used in conjunction with a safety-agnostic control
policy, precluding any task-driven actions that could result in loss of safety.
We evaluate our learning-based safety approach in a 5D race car simulator,
compare the learned safety policy to the numerically obtained optimal solution,
and empirically validate the robust safety guarantee of our proposed safety
shield against worst-case model discrepancy.Comment: Accepted in 5th Annual Learning for Dynamics & Control Conference
(L4DC), University of Pennsylvani
- …