1 research outputs found
On Model Robustness Against Adversarial Examples
We study the model robustness against adversarial examples, referred to as
small perturbed input data that may however fool many state-of-the-art deep
learning models. Unlike previous research, we establish a novel theory
addressing the robustness issue from the perspective of stability of the loss
function in the small neighborhood of natural examples. We propose to exploit
an energy function to describe the stability and prove that reducing such
energy guarantees the robustness against adversarial examples. We also show
that the traditional training methods including adversarial training with the
norm constraint (AT) and Virtual Adversarial Training (VAT) tend to
minimize the lower bound of our proposed energy function. We make an analysis
showing that minimization of such lower bound can however lead to insufficient
robustness within the neighborhood around the input sample. Furthermore, we
design a more rational method with the energy regularization which proves to
achieve better robustness than previous methods. Through a series of
experiments, we demonstrate the superiority of our model on both supervised
tasks and semi-supervised tasks. In particular, our proposed adversarial
framework achieves the best performance compared with previous adversarial
training methods on benchmark datasets MNIST, CIFAR-10, and SVHN. Importantly,
they demonstrate much better robustness against adversarial examples than all
the other comparison methods.Comment: some theoretical bounds need to be revise