Reinforcement Learning (RL) has achieved remarkable success in
safety-critical areas, but it can be weakened by adversarial attacks. Recent
studies have introduced "smoothed policies" in order to enhance its robustness.
Yet, it is still challenging to establish a provable guarantee to certify the
bound of its total reward. Prior methods relied primarily on computing bounds
using Lipschitz continuity or calculating the probability of cumulative reward
above specific thresholds. However, these techniques are only suited for
continuous perturbations on the RL agent's observations and are restricted to
perturbations bounded by the l2​-norm. To address these limitations, this
paper proposes a general black-box certification method capable of directly
certifying the cumulative reward of the smoothed policy under various
lp​-norm bounded perturbations. Furthermore, we extend our methodology to
certify perturbations on action spaces. Our approach leverages f-divergence to
measure the distinction between the original distribution and the perturbed
distribution, subsequently determining the certification bound by solving a
convex optimisation problem. We provide a comprehensive theoretical analysis
and run sufficient experiments in multiple environments. Our results show that
our method not only improves the certified lower bound of mean cumulative
reward but also demonstrates better efficiency than state-of-the-art
techniques.Comment: This paper will be presented in AAAI202