1 research outputs found
Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders
Machine learning models are vulnerable to adversarial examples. Iterative
adversarial training has shown promising results against strong white-box
attacks. However, adversarial training is very expensive, and every time a
model needs to be protected, such expensive training scheme needs to be
performed. In this paper, we propose to apply iterative adversarial training
scheme to an external auto-encoder, which once trained can be used to protect
other models directly. We empirically show that our model outperforms other
purifying-based methods against white-box attacks, and transfers well to
directly protect other base models with different architectures