62 research outputs found
Recommended from our members
Game-Theoretic Safety Assurance for Human-Centered Robotic Systems
In order for autonomous systems like robots, drones, and self-driving cars to be reliably introduced into our society, they must have the ability to actively account for safety during their operation. While safety analysis has traditionally been conducted offline for controlled environments like cages on factory floors, the much higher complexity of open, human-populated spaces like our homes, cities, and roads makes it unviable to rely on common design-time assumptions, since these may be violated once the system is deployed. Instead, the next generation of robotic technologies will need to reason about safety online, constructing high-confidence assurances informed by ongoing observations of the environment and other agents, in spite of models of them being necessarily fallible.This dissertation aims to lay down the necessary foundations to enable autonomous systems to ensure their own safety in complex, changing, and uncertain environments, by explicitly reasoning about the gap between their models and the real world. It first introduces a suite of novel robust optimal control formulations and algorithmic tools that permit tractable safety analysis in time-varying, multi-agent systems, as well as safe real-time robotic navigation in partially unknown environments; these approaches are demonstrated on large-scale unmanned air traffic simulation and physical quadrotor platforms. After this, it draws on Bayesian machine learning methods to translate model-based guarantees into high-confidence assurances, monitoring the reliability of predictive models in light of changing evidence about the physical system and surrounding agents. This principle is first applied to a general safety framework allowing the use of learning-based control (e.g. reinforcement learning) for safety-critical robotic systems such as drones, and then combined with insights from cognitive science and dynamic game theory to enable safe human-centered navigation and interaction; these techniques are showcased on physical quadrotors—flying in unmodeled wind and among human pedestrians—and simulated highway driving. The dissertation ends with a discussion of challenges and opportunities ahead, including the bridging of safety analysis and reinforcement learning and the need to ``close the loop'' around learning and adaptation in order to deploy increasingly advanced autonomous systems with confidence
A Forward Reachability Perspective on Robust Control Invariance and Discount Factors in Reachability Analysis
Control invariant sets are crucial for various methods that aim to design
safe control policies for systems whose state constraints must be satisfied
over an indefinite time horizon. In this article, we explore the connections
among reachability, control invariance, and Control Barrier Functions (CBFs) by
examining the forward reachability problem associated with control invariant
sets. We present the notion of an "inevitable Forward Reachable Tube" (FRT) as
a tool for analyzing control invariant sets. Our findings show that the
inevitable FRT of a robust control invariant set with a differentiable boundary
is the set itself. We highlight the role of the differentiability of the
boundary in shaping the FRTs of the sets through numerical examples. We also
formulate a zero-sum differential game between the control and disturbance,
where the inevitable FRT is characterized by the zero-superlevel set of the
value function. By incorporating a discount factor in the cost function of the
game, the barrier constraint of the CBF naturally arises as the constraint that
is imposed on the optimal control policy. As a result, the value function of
our FRT formulation serves as a CBF-like function, which has not been
previously realized in reachability studies. Conversely, any valid CBF is also
a forward reachability value function inside the control invariant set, thereby
revealing the inverse optimality of the CBF. As such, our work establishes a
strong link between reachability, control invariance, and CBFs, filling a gap
that prior formulations based on backward reachability were unable to bridge.Comment: The first two authors contributed equally to this wor
Iterative Reachability Estimation for Safe Reinforcement Learning
Ensuring safety is important for the practical deployment of reinforcement
learning (RL). Various challenges must be addressed, such as handling
stochasticity in the environments, providing rigorous guarantees of persistent
state-wise safety satisfaction, and avoiding overly conservative behaviors that
sacrifice performance. We propose a new framework, Reachability Estimation for
Safe Policy Optimization (RESPO), for safety-constrained RL in general
stochastic settings. In the feasible set where there exist violation-free
policies, we optimize for rewards while maintaining persistent safety. Outside
this feasible set, our optimization produces the safest behavior by
guaranteeing entrance into the feasible set whenever possible with the least
cumulative discounted violations. We introduce a class of algorithms using our
novel reachability estimation function to optimize in our proposed framework
and in similar frameworks such as those concurrently handling multiple hard and
soft constraints. We theoretically establish that our algorithms almost surely
converge to locally optimal policies of our safe optimization framework. We
evaluate the proposed methods on a diverse suite of safe RL environments from
Safety Gym, PyBullet, and MuJoCo, and show the benefits in improving both
reward performance and safety compared with state-of-the-art baselines.Comment: Accepted in NeurIPS 202
Recommended from our members
Approaches to Safety in Inverse Reinforcement Learning
As the capabilities of robotic systems increase, we move closer to the vision of ubiquitous robotic assistance throughout our everyday lives. In transitioning robots and autonomous systems from traditional factory and industrial settings, it is critical that these systems are able to adapt to uncertain environments and the humans who populate them. In order to better understand and predict the behavior of these humans, Inverse Reinforcement Learning (IRL) uses demonstrations to infer the underlying motivations driving human actions. The information gained from IRL can be used to improve a robot’s understanding of the environment as well as to allow the robot to better interact with or assist humans.In this dissertation, we address the challenge of incorporating safety into the application of IRL. We first consider safety in the context of using IRL for assisting humans in shared control tasks. Through a user study, we show how incorporating haptic feedback into human assistance can increase humans’ sense of control while improving safety in the presence of imperfect learning. Further, we present our method for using IRL to automatically create such haptic feedback policies from task demonstrations.We further address safety in IRL by incorporating notions of safety directly into the learning process. Currently, most work on IRL focuses on learning explanatory rewards that humans are modeled as optimizing. However, pure reward optimization can fail to effectively capture hard requirements, such as safety constraints. We draw on the definition of safety from Hamilton-Jacobi reachability analysis to infer human perceptions of safety and to modify robot behavior to respect these learned safety constraints. We also extend this work on learning constraints by adapting the framework of Maximum Entropy IRL in order to learn hard constraints given nominal task rewards, and we show how this technique infers the most likely constraints to align expected behavior with observed demonstrations
Safe Reinforcement Learning with Dual Robustness
Reinforcement learning (RL) agents are vulnerable to adversarial
disturbances, which can deteriorate task performance or compromise safety
specifications. Existing methods either address safety requirements under the
assumption of no adversary (e.g., safe RL) or only focus on robustness against
performance adversaries (e.g., robust RL). Learning one policy that is both
safe and robust remains a challenging open problem. The difficulty is how to
tackle two intertwined aspects in the worst cases: feasibility and optimality.
Optimality is only valid inside a feasible region, while identification of
maximal feasible region must rely on learning the optimal policy. To address
this issue, we propose a systematic framework to unify safe RL and robust RL,
including problem formulation, iteration scheme, convergence analysis and
practical algorithm design. This unification is built upon constrained
two-player zero-sum Markov games. A dual policy iteration scheme is proposed,
which simultaneously optimizes a task policy and a safety policy. The
convergence of this iteration scheme is proved. Furthermore, we design a deep
RL algorithm for practical implementation, called dually robust actor-critic
(DRAC). The evaluations with safety-critical benchmarks demonstrate that DRAC
achieves high performance and persistent safety under all scenarios (no
adversary, safety adversary, performance adversary), outperforming all
baselines significantly
Learning Predictive Safety Filter via Decomposition of Robust Invariant Set
Ensuring safety of nonlinear systems under model uncertainty and external
disturbances is crucial, especially for real-world control tasks. Predictive
methods such as robust model predictive control (RMPC) require solving
nonconvex optimization problems online, which leads to high computational
burden and poor scalability. Reinforcement learning (RL) works well with
complex systems, but pays the price of losing rigorous safety guarantee. This
paper presents a theoretical framework that bridges the advantages of both RMPC
and RL to synthesize safety filters for nonlinear systems with state- and
action-dependent uncertainty. We decompose the robust invariant set (RIS) into
two parts: a target set that aligns with terminal region design of RMPC, and a
reach-avoid set that accounts for the rest of RIS. We propose a policy
iteration approach for robust reach-avoid problems and establish its monotone
convergence. This method sets the stage for an adversarial actor-critic deep RL
algorithm, which simultaneously synthesizes a reach-avoid policy network, a
disturbance policy network, and a reach-avoid value network. The learned
reach-avoid policy network is utilized to generate nominal trajectories for
online verification, which filters potentially unsafe actions that may drive
the system into unsafe regions when worst-case disturbances are applied. We
formulate a second-order cone programming (SOCP) approach for online
verification using system level synthesis, which optimizes for the worst-case
reach-avoid value of any possible trajectories. The proposed safety filter
requires much lower computational complexity than RMPC and still enjoys
persistent robust safety guarantee. The effectiveness of our method is
illustrated through a numerical example
- …