21,859 research outputs found
Safety-guided deep reinforcement learning via online gaussian process estimation
An important facet of reinforcement learning (RL) has to do with how the agent goes about exploring the environment. Traditional exploration strategies typically focus on efficiency and ignore safety. However, for practical applications, ensuring safety of the agent during exploration is crucial since performing an unsafe action or reaching an unsafe state could result in irreversible damage to the agent. The main challenge of safe exploration is that characterizing the unsafe states and actions is difficult for large continuous state or action spaces and unknown environments. In this paper, we propose a novel approach to incorporate estimations of safety to guide exploration and policy search in deep reinforcement learning. By using a cost function to capture trajectory-based safety, our key idea is to formulate the state-action value function of this safety cost as a candidate Lyapunov function and extend control-theoretic results to approximate its derivative using online Gaussian Process (GP) estimation. We show how to use these statistical models to guide the agent in unknown environments to obtain high-performance control policies with provable stability certificates.Accepted manuscrip
Recommended from our members
Understanding Model-Based Reinforcement Learning and its Application in Safe Reinforcement Learning
Model-based reinforcement learning algorithms have been shown to achieve successful results on various continuous control benchmarks, but the understanding of model-based methods is limited. We try to interpret how model-based method works through novel experiments on state-of-the-art algorithms with an emphasis on the model learning part. We evaluate the role of the model learning in policy optimization and propose methods to learn a more accurate model. With a better understanding of model-based reinforcement learning, we then apply model-based methods to solve safe reinforcement learning (RL) problems with near-zero violation of hard constraints throughout training. Drawing an analogy with how humans and animals learn to perform safe actions, we break down the safe RL problem into three stages. First, we train agents in a constraint-free environment to learn a performant policy for reaching high rewards, and simultaneously learn a model of the dynamics. Second, we use model-based methods to plan safe actions and train a safeguarding policy from these actions through imitation. Finally, we propose a factored framework to train an overall policy that mixes the performant policy and the safeguarding policy. This three-step curriculum ensures near-zero violation of safety constraints at all times. As an advantage of model-based method, the sample complexity required at the second and third steps of the process is significantly lower than model-free methods and can enable online safe learning. We demonstrate the effectiveness of our methods in various continuous control problems and analyze the advantages over state-of-the-art approaches
Provably Safe Robot Navigation with Obstacle Uncertainty
As drones and autonomous cars become more widespread it is becoming
increasingly important that robots can operate safely under realistic
conditions. The noisy information fed into real systems means that robots must
use estimates of the environment to plan navigation. Efficiently guaranteeing
that the resulting motion plans are safe under these circumstances has proved
difficult. We examine how to guarantee that a trajectory or policy is safe with
only imperfect observations of the environment. We examine the implications of
various mathematical formalisms of safety and arrive at a mathematical notion
of safety of a long-term execution, even when conditioned on observational
information. We present efficient algorithms that can prove that trajectories
or policies are safe with much tighter bounds than in previous work. Notably,
the complexity of the environment does not affect our methods ability to
evaluate if a trajectory or policy is safe. We then use these safety checking
methods to design a safe variant of the RRT planning algorithm.Comment: RSS 201
- …