1 research outputs found
Your Out-of-Distribution Detection Method is Not Robust!
Out-of-distribution (OOD) detection has recently gained substantial attention
due to the importance of identifying out-of-domain samples in reliability and
safety. Although OOD detection methods have advanced by a great deal, they are
still susceptible to adversarial examples, which is a violation of their
purpose. To mitigate this issue, several defenses have recently been proposed.
Nevertheless, these efforts remained ineffective, as their evaluations are
based on either small perturbation sizes, or weak attacks. In this work, we
re-examine these defenses against an end-to-end PGD attack on in/out data with
larger perturbation sizes, e.g. up to commonly used for the
CIFAR-10 dataset. Surprisingly, almost all of these defenses perform worse than
a random detection under the adversarial setting. Next, we aim to provide a
robust OOD detection method. In an ideal defense, the training should expose
the model to almost all possible adversarial perturbations, which can be
achieved through adversarial training. That is, such training perturbations
should based on both in- and out-of-distribution samples. Therefore, unlike OOD
detection in the standard setting, access to OOD, as well as in-distribution,
samples sounds necessary in the adversarial training setup. These tips lead us
to adopt generative OOD detection methods, such as OpenGAN, as a baseline. We
subsequently propose the Adversarially Trained Discriminator (ATD), which
utilizes a pre-trained robust model to extract robust features, and a generator
model to create OOD samples. Using ATD with CIFAR-10 and CIFAR-100 as the
in-distribution data, we could significantly outperform all previous methods in
the robust AUROC while maintaining high standard AUROC and classification
accuracy. The code repository is available at https://github.com/rohban-lab/ATD .Comment: Accepted to NeurIPS 202