Although fast adversarial training provides an efficient approach for
building robust networks, it may suffer from a serious problem known as
catastrophic overfitting (CO), where multi-step robust accuracy suddenly
collapses to zero. In this paper, we for the first time decouple single-step
adversarial examples into data-information and self-information, which reveals
an interesting phenomenon called "self-fitting". Self-fitting, i.e., the
network learns the self-information embedded in single-step perturbations,
naturally leads to the occurrence of CO. When self-fitting occurs, the network
experiences an obvious "channel differentiation" phenomenon that some
convolution channels accounting for recognizing self-information become
dominant, while others for data-information are suppressed. In this way, the
network can only recognize images with sufficient self-information and loses
generalization ability to other types of data. Based on self-fitting, we
provide new insights into the existing methods to mitigate CO and extend CO to
multi-step adversarial training. Our findings reveal a self-learning mechanism
in adversarial training and open up new perspectives for suppressing different
kinds of information to mitigate CO.Comment: Comment: The camera-ready version (accepted at CVPR Workshop of
Adversarial Machine Learning on Computer Vision: Art of Robustness, 2023