Investigating Catastrophic Overfitting in Fast Adversarial Training: A
  Self-fitting Perspective

Chen, Sizhe; He, Zhengbao; Huang, Xiaolin; Li, Tao

Investigating Catastrophic Overfitting in Fast Adversarial Training: A Self-fitting Perspective

Authors: Sizhe Chen
Zhengbao He
Xiaolin Huang
Tao Li
Publication date: 24 March 2023
Publisher

Abstract

Although fast adversarial training provides an efficient approach for building robust networks, it may suffer from a serious problem known as catastrophic overfitting (CO), where multi-step robust accuracy suddenly collapses to zero. In this paper, we for the first time decouple single-step adversarial examples into data-information and self-information, which reveals an interesting phenomenon called "self-fitting". Self-fitting, i.e., the network learns the self-information embedded in single-step perturbations, naturally leads to the occurrence of CO. When self-fitting occurs, the network experiences an obvious "channel differentiation" phenomenon that some convolution channels accounting for recognizing self-information become dominant, while others for data-information are suppressed. In this way, the network can only recognize images with sufficient self-information and loses generalization ability to other types of data. Based on self-fitting, we provide new insights into the existing methods to mitigate CO and extend CO to multi-step adversarial training. Our findings reveal a self-learning mechanism in adversarial training and open up new perspectives for suppressing different kinds of information to mitigate CO.Comment: Comment: The camera-ready version (accepted at CVPR Workshop of Adversarial Machine Learning on Computer Vision: Art of Robustness, 2023

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2302.11963

Last time updated on 18/03/2023