169 research outputs found
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
We propose a conceptually simple and lightweight framework for improving the
robustness of vision models through the combination of knowledge distillation
and data augmentation. We address the conjecture that larger models do not make
for better teachers by showing strong gains in out-of-distribution robustness
when distilling from pretrained foundation models. Following this finding, we
propose Discrete Adversarial Distillation (DAD), which leverages a robust
teacher to generate adversarial examples and a VQGAN to discretize them,
creating more informative samples than standard data augmentation techniques.
We provide a theoretical framework for the use of a robust teacher in the
knowledge distillation with data augmentation setting and demonstrate strong
gains in out-of-distribution robustness and clean accuracy across different
student architectures. Notably, our method adds minor computational overhead
compared to similar techniques and can be easily combined with other data
augmentations for further improvements.Comment: Published in NeurIPS 202
ASK: Adversarial Soft k-Nearest Neighbor Attack and Defense
K-Nearest Neighbor (kNN)-based deep learning methods have been applied to
many applications due to their simplicity and geometric interpretability.
However, the robustness of kNN-based classification models has not been
thoroughly explored and kNN attack strategies are underdeveloped. In this
paper, we propose an Adversarial Soft kNN (ASK) loss to both design more
effective kNN attack strategies and to develop better defenses against them.
Our ASK loss approach has two advantages. First, ASK loss can better
approximate the kNN's probability of classification error than objectives
proposed in previous works. Second, the ASK loss is interpretable: it preserves
the mutual information between the perturbed input and the in-class-reference
data. We use the ASK loss to generate a novel attack method called the
ASK-Attack (ASK-Atk), which shows superior attack efficiency and accuracy
degradation relative to previous kNN attacks. Based on the ASK-Atk, we then
derive an ASK-\underline{Def}ense (ASK-Def) method that optimizes the
worst-case training loss induced by ASK-Atk. Experiments on CIFAR-10 (ImageNet)
show that (i) ASK-Atk achieves () improvement in attack
success rate over previous kNN attacks, and (ii) ASK-Def outperforms the
conventional adversarial training method by () in
terms of robustness improvement
Provable Unrestricted Adversarial Training without Compromise with Generalizability
Adversarial training (AT) is widely considered as the most promising strategy
to defend against adversarial attacks and has drawn increasing interest from
researchers. However, the existing AT methods still suffer from two challenges.
First, they are unable to handle unrestricted adversarial examples (UAEs),
which are built from scratch, as opposed to restricted adversarial examples
(RAEs), which are created by adding perturbations bound by an norm to
observed examples. Second, the existing AT methods often achieve adversarial
robustness at the expense of standard generalizability (i.e., the accuracy on
natural examples) because they make a tradeoff between them. To overcome these
challenges, we propose a unique viewpoint that understands UAEs as
imperceptibly perturbed unobserved examples. Also, we find that the tradeoff
results from the separation of the distributions of adversarial examples and
natural examples. Based on these ideas, we propose a novel AT approach called
Provable Unrestricted Adversarial Training (PUAT), which can provide a target
classifier with comprehensive adversarial robustness against both UAE and RAE,
and simultaneously improve its standard generalizability. Particularly, PUAT
utilizes partially labeled data to achieve effective UAE generation by
accurately capturing the natural data distribution through a novel augmented
triple-GAN. At the same time, PUAT extends the traditional AT by introducing
the supervised loss of the target classifier into the adversarial loss and
achieves the alignment between the UAE distribution, the natural data
distribution, and the distribution learned by the classifier, with the
collaboration of the augmented triple-GAN. Finally, the solid theoretical
analysis and extensive experiments conducted on widely-used benchmarks
demonstrate the superiority of PUAT
- …