On Fragile Features and Batch Normalization in Adversarial Training

Abstract

Modern deep learning architecture utilize batch normalization (BN) tostabilize training and improve accuracy. It has been shown that the BN layersalone are surprisingly expressive. In the context of robustness againstadversarial examples, however, BN is argued to increase vulnerability. That is,BN helps to learn fragile features. Nevertheless, BN is still used inadversarial training, which is the de-facto standard to learn robust features.In order to shed light on the role of BN in adversarial training, weinvestigate to what extent the expressiveness of BN can be used to robustifyfragile features in comparison to random features. On CIFAR10, we find thatadversarially fine-tuning just the BN layers can result in non-trivialadversarial robustness. Adversarially training only the BN layers from scratch,in contrast, is not able to convey meaningful adversarial robustness. Ourresults indicate that fragile features can be used to learn models withmoderate adversarial robustness, while random features cannot<br

    Similar works