35 research outputs found
Enhancing Fine-Tuning Based Backdoor Defense with Sharpness-Aware Minimization
Backdoor defense, which aims to detect or mitigate the effect of malicious
triggers introduced by attackers, is becoming increasingly critical for machine
learning security and integrity. Fine-tuning based on benign data is a natural
defense to erase the backdoor effect in a backdoored model. However, recent
studies show that, given limited benign data, vanilla fine-tuning has poor
defense performance. In this work, we provide a deep study of fine-tuning the
backdoored model from the neuron perspective and find that backdoorrelated
neurons fail to escape the local minimum in the fine-tuning process. Inspired
by observing that the backdoorrelated neurons often have larger norms, we
propose FTSAM, a novel backdoor defense paradigm that aims to shrink the norms
of backdoor-related neurons by incorporating sharpness-aware minimization with
fine-tuning. We demonstrate the effectiveness of our method on several
benchmark datasets and network architectures, where it achieves
state-of-the-art defense performance. Overall, our work provides a promising
avenue for improving the robustness of machine learning models against backdoor
attacks
Weight Compander: A Simple Weight Reparameterization for Regularization
Regularization is a set of techniques that are used to improve the
generalization ability of deep neural networks. In this paper, we introduce
weight compander (WC), a novel effective method to improve generalization by
reparameterizing each weight in deep neural networks using a nonlinear
function. It is a general, intuitive, cheap and easy to implement method, which
can be combined with various other regularization techniques. Large weights in
deep neural networks are a sign of a more complex network that is overfitted to
the training data. Moreover, regularized networks tend to have a greater range
of weights around zero with fewer weights centered at zero. We introduce a
weight reparameterization function which is applied to each weight and
implicitly reduces overfitting by restricting the magnitude of the weights
while forcing them away from zero at the same time. This leads to a more
democratic decision-making in the network. Firstly, individual weights cannot
have too much influence in the prediction process due to the restriction of
their magnitude. Secondly, more weights are used in the prediction process,
since they are forced away from zero during the training. This promotes the
extraction of more features from the input data and increases the level of
weight redundancy, which makes the network less sensitive to statistical
differences between training and test data. We extend our method to learn the
hyperparameters of the introduced weight reparameterization function. This
avoids hyperparameter search and gives the network the opportunity to align the
weight reparameterization with the training progress. We show experimentally
that using weight compander in addition to standard regularization methods
improves the performance of neural networks.Comment: Accepted by The International Joint Conference on Neural Network
(IJCNN) 202
Understanding Gradient Descent on Edge of Stability in Deep Learning
Deep learning experiments by Cohen et al. [2021] using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR)
and sharpness (i.e., the largest eigenvalue of Hessian) no longer behave as in
traditional optimization. Sharpness stabilizes around LR and loss goes up
and down across iterations, yet still with an overall downward trend. The
current paper mathematically analyzes a new mechanism of implicit
regularization in the EoS phase, whereby GD updates due to non-smooth loss
landscape turn out to evolve along some deterministic flow on the manifold of
minimum loss. This is in contrast to many previous results about implicit bias
either relying on infinitesimal updates or noise in gradient. Formally, for any
smooth function with certain regularity condition, this effect is
demonstrated for (1) Normalized GD, i.e., GD with a varying LR and loss ; (2) GD with constant LR and
loss . Both provably enter the Edge of Stability, with
the associated flow on the manifold minimizing . The
above theoretical results have been corroborated by an experimental study.Comment: 63 pages. This paper has been accepted for conference proceedings in
the 39th International Conference on Machine Learning (ICML), 202