The current state-of-the-art defense methods against adversarial examples
typically focus on improving either empirical or certified robustness. Among
them, adversarially trained (AT) models produce empirical state-of-the-art
defense against adversarial examples without providing any robustness
guarantees for large classifiers or higher-dimensional inputs. In contrast,
existing randomized smoothing based models achieve state-of-the-art certified
robustness while significantly degrading the empirical robustness against
adversarial examples. In this paper, we propose a novel method, called
\emph{Certification through Adaptation}, that transforms an AT model into a
randomized smoothing classifier during inference to provide certified
robustness for ℓ2​ norm without affecting their empirical robustness
against adversarial attacks. We also propose \emph{Auto-Noise} technique that
efficiently approximates the appropriate noise levels to flexibly certify the
test examples using randomized smoothing technique. Our proposed
\emph{Certification through Adaptation} with \emph{Auto-Noise} technique
achieves an \textit{average certified radius (ACR) scores} up to 1.102 and
1.148 respectively for CIFAR-10 and ImageNet datasets using AT models without
affecting their empirical robustness or benign accuracy. Therefore, our paper
is a step towards bridging the gap between the empirical and certified
robustness against adversarial examples by achieving both using the same
classifier.Comment: An abridged version of this work has been presented at ICLR 2021
Workshop on Security and Safety in Machine Learning Systems:
https://aisecure-workshop.github.io/aml-iclr2021/papers/2.pd