The problem of adversarial defenses for image classification, where the goal
is to robustify a classifier against adversarial examples, is considered.
Inspired by the hypothesis that these examples lie beyond the natural image
manifold, a novel aDversarIal defenSe with local impliCit functiOns (DISCO) is
proposed to remove adversarial perturbations by localized manifold projections.
DISCO consumes an adversarial image and a query pixel location and outputs a
clean RGB value at the location. It is implemented with an encoder and a local
implicit module, where the former produces per-pixel deep features and the
latter uses the features in the neighborhood of query pixel for predicting the
clean RGB value. Extensive experiments demonstrate that both DISCO and its
cascade version outperform prior defenses, regardless of whether the defense is
known to the attacker. DISCO is also shown to be data and parameter efficient
and to mount defenses that transfers across datasets, classifiers and attacks.Comment: Accepted to Neurips 202