508 research outputs found
Safe Q-learning for continuous-time linear systems
Q-learning is a promising method for solving optimal control problems for
uncertain systems without the explicit need for system identification. However,
approaches for continuous-time Q-learning have limited provable safety
guarantees, which restrict their applicability to real-time safety-critical
systems. This paper proposes a safe Q-learning algorithm for partially unknown
linear time-invariant systems to solve the linear quadratic regulator problem
with user-defined state constraints. We frame the safe Q-learning problem as a
constrained optimal control problem using reciprocal control barrier functions
and show that such an extension provides a safety-assured control policy. To
the best of our knowledge, Q-learning for continuous-time systems with state
constraints has not yet been reported in the literature
Safe Reinforcement Learning with Dual Robustness
Reinforcement learning (RL) agents are vulnerable to adversarial
disturbances, which can deteriorate task performance or compromise safety
specifications. Existing methods either address safety requirements under the
assumption of no adversary (e.g., safe RL) or only focus on robustness against
performance adversaries (e.g., robust RL). Learning one policy that is both
safe and robust remains a challenging open problem. The difficulty is how to
tackle two intertwined aspects in the worst cases: feasibility and optimality.
Optimality is only valid inside a feasible region, while identification of
maximal feasible region must rely on learning the optimal policy. To address
this issue, we propose a systematic framework to unify safe RL and robust RL,
including problem formulation, iteration scheme, convergence analysis and
practical algorithm design. This unification is built upon constrained
two-player zero-sum Markov games. A dual policy iteration scheme is proposed,
which simultaneously optimizes a task policy and a safety policy. The
convergence of this iteration scheme is proved. Furthermore, we design a deep
RL algorithm for practical implementation, called dually robust actor-critic
(DRAC). The evaluations with safety-critical benchmarks demonstrate that DRAC
achieves high performance and persistent safety under all scenarios (no
adversary, safety adversary, performance adversary), outperforming all
baselines significantly
Patching Neural Barrier Functions Using Hamilton-Jacobi Reachability
Learning-based control algorithms have led to major advances in robotics at
the cost of decreased safety guarantees. Recently, neural networks have also
been used to characterize safety through the use of barrier functions for
complex nonlinear systems. Learned barrier functions approximately encode and
enforce a desired safety constraint through a value function, but do not
provide any formal guarantees. In this paper, we propose a local dynamic
programming (DP) based approach to "patch" an almost-safe learned barrier at
potentially unsafe points in the state space. This algorithm, HJ-Patch, obtains
a novel barrier that provides formal safety guarantees, yet retains the global
structure of the learned barrier. Our local DP based reachability algorithm,
HJ-Patch, updates the barrier function "minimally" at points that both (a)
neighbor the barrier safety boundary and (b) do not satisfy the safety
condition. We view this as a key step to bridging the gap between
learning-based barrier functions and Hamilton-Jacobi reachability analysis,
providing a framework for further integration of these approaches. We
demonstrate that for well-trained barriers we reduce the computational load by
2 orders of magnitude with respect to standard DP-based reachability, and
demonstrate scalability to a 6-dimensional system, which is at the limit of
standard DP-based reachability.Comment: 8 pages, submitted to IEEE Conference on Decision and Control (CDC),
202
Recommended from our members
Game-Theoretic Safety Assurance for Human-Centered Robotic Systems
In order for autonomous systems like robots, drones, and self-driving cars to be reliably introduced into our society, they must have the ability to actively account for safety during their operation. While safety analysis has traditionally been conducted offline for controlled environments like cages on factory floors, the much higher complexity of open, human-populated spaces like our homes, cities, and roads makes it unviable to rely on common design-time assumptions, since these may be violated once the system is deployed. Instead, the next generation of robotic technologies will need to reason about safety online, constructing high-confidence assurances informed by ongoing observations of the environment and other agents, in spite of models of them being necessarily fallible.This dissertation aims to lay down the necessary foundations to enable autonomous systems to ensure their own safety in complex, changing, and uncertain environments, by explicitly reasoning about the gap between their models and the real world. It first introduces a suite of novel robust optimal control formulations and algorithmic tools that permit tractable safety analysis in time-varying, multi-agent systems, as well as safe real-time robotic navigation in partially unknown environments; these approaches are demonstrated on large-scale unmanned air traffic simulation and physical quadrotor platforms. After this, it draws on Bayesian machine learning methods to translate model-based guarantees into high-confidence assurances, monitoring the reliability of predictive models in light of changing evidence about the physical system and surrounding agents. This principle is first applied to a general safety framework allowing the use of learning-based control (e.g. reinforcement learning) for safety-critical robotic systems such as drones, and then combined with insights from cognitive science and dynamic game theory to enable safe human-centered navigation and interaction; these techniques are showcased on physical quadrotors—flying in unmodeled wind and among human pedestrians—and simulated highway driving. The dissertation ends with a discussion of challenges and opportunities ahead, including the bridging of safety analysis and reinforcement learning and the need to ``close the loop'' around learning and adaptation in order to deploy increasingly advanced autonomous systems with confidence
Learning-Aware Safety for Interactive Autonomy
One of the outstanding challenges for the widespread deployment of robotic
systems like autonomous vehicles is ensuring safe interaction with humans
without sacrificing efficiency. Existing safety analysis methods often neglect
the robot's ability to learn and adapt at runtime, leading to overly
conservative behavior. This paper proposes a new closed-loop paradigm for
synthesizing safe control policies that explicitly account for the system's
evolving uncertainty under possible future scenarios. The formulation reasons
jointly about the physical dynamics and the robot's learning algorithm, which
updates its internal belief over time. We leverage adversarial deep
reinforcement learning (RL) for scaling to high dimensions, enabling tractable
safety analysis even for implicit learning dynamics induced by state-of-the-art
prediction models. We demonstrate our framework's ability to work with both
Bayesian belief propagation and the implicit learning induced by a large
pre-trained neural trajectory predictor.Comment: Conference on Robot Learning 202
A Forward Reachability Perspective on Robust Control Invariance and Discount Factors in Reachability Analysis
Control invariant sets are crucial for various methods that aim to design
safe control policies for systems whose state constraints must be satisfied
over an indefinite time horizon. In this article, we explore the connections
among reachability, control invariance, and Control Barrier Functions (CBFs) by
examining the forward reachability problem associated with control invariant
sets. We present the notion of an "inevitable Forward Reachable Tube" (FRT) as
a tool for analyzing control invariant sets. Our findings show that the
inevitable FRT of a robust control invariant set with a differentiable boundary
is the set itself. We highlight the role of the differentiability of the
boundary in shaping the FRTs of the sets through numerical examples. We also
formulate a zero-sum differential game between the control and disturbance,
where the inevitable FRT is characterized by the zero-superlevel set of the
value function. By incorporating a discount factor in the cost function of the
game, the barrier constraint of the CBF naturally arises as the constraint that
is imposed on the optimal control policy. As a result, the value function of
our FRT formulation serves as a CBF-like function, which has not been
previously realized in reachability studies. Conversely, any valid CBF is also
a forward reachability value function inside the control invariant set, thereby
revealing the inverse optimality of the CBF. As such, our work establishes a
strong link between reachability, control invariance, and CBFs, filling a gap
that prior formulations based on backward reachability were unable to bridge.Comment: The first two authors contributed equally to this wor
- …