It is quite challenging to ensure the safety of reinforcement learning (RL)
agents in an unknown and stochastic environment under hard constraints that
require the system state not to reach certain specified unsafe regions. Many
popular safe RL methods such as those based on the Constrained Markov Decision
Process (CMDP) paradigm formulate safety violations in a cost function and try
to constrain the expectation of cumulative cost under a threshold. However, it
is often difficult to effectively capture and enforce hard reachability-based
safety constraints indirectly with such constraints on safety violation costs.
In this work, we leverage the notion of barrier function to explicitly encode
the hard safety constraints, and given that the environment is unknown, relax
them to our design of \emph{generative-model-based soft barrier functions}.
Based on such soft barriers, we propose a safe RL approach that can jointly
learn the environment and optimize the control policy, while effectively
avoiding unsafe regions with safety probability optimization. Experiments on a
set of examples demonstrate that our approach can effectively enforce hard
safety constraints and significantly outperform CMDP-based baseline methods in
system safe rate measured via simulations.Comment: 13 pages, 7 figure

Huang, Chao

Jiao, Ruochen

Jin, Wanxin

Wang, Yixuan

Wang, Zhaoran

Wang, Zhilu

Yang, Zhuoran

Zhan, Simon Sinong

Zhu, Qi

English

arXiv

It is quite challenging to ensure the safety of reinforcement learning (RL)
agents in an unknown and stochastic environment under hard constraints that
require the system state not to reach certain specified unsafe regions. Many
popular safe RL methods such as those based on the Constrained Markov Decision
Process (CMDP) paradigm formulate safety violations in a cost function and try
to constrain the expectation of cumulative cost under a threshold. However, it
is often difficult to effectively capture and enforce hard reachability-based
safety constraints indirectly with such constraints on safety violation costs.
In this work, we leverage the notion of barrier function to explicitly encode
the hard safety constraints, and given that the environment is unknown, relax
them to our design of \emph{generative-model-based soft barrier functions}.
Based on such soft barriers, we propose a safe RL approach that can jointly
learn the environment and optimize the control policy, while effectively
avoiding unsafe regions with safety probability optimization. Experiments on a
set of examples demonstrate that our approach can effectively enforce hard
safety constraints and significantly outperform CMDP-based baseline methods in
system safe rate measured via simulations.Comment: fix typos, updated paper writin

arXiv.org e-Print Archive

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
  Learning in Unknown Stochastic Environments

Reinforcement Learning (RL) has long grappled with the issue of ensuring agent safety in unpredictable and stochastic environments, particularly under hard constraints that require the system state not to reach unsafe regions. Conventional safe RL methods such as those based on the Constrained Markov Decision Process (CMDP) paradigm formulate safety violations in a cost function and try to constrain the expectation of cumulative cost under a threshold. However, it is often difficult to effectively capture and enforce hard reachability-based safety constraints indirectly with such constraints on safety violation cost. In this work, we leverage the notion of barrier function to explicitly encode the hard safety chance constraints, and as the environment is unknown, relax them to our design of generative-model-based soft barrier functions. Based on such soft barriers, we propose a novel safe RL approach with bi-level optimization that can jointly learn the unknown environment and optimize the control policy, while effectively avoiding the unsafe region with safety probability optimization. Experiments on a set of examples demonstrate that our approach can effectively enforce hard safety chance constraints and significantly outperform CMDP-based baseline methods in system safe rates measured via simulations

Wang, Y

Zhan, SS

Jiao, R

Wang, Z

Jin, W

Yang, Z

Huang, C

Zhu, Q

University of Liverpool Repository

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

Abstract

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

University of Liverpool Repository