Search CORE

17 research outputs found

Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural Networks

Author: Backes Michael
Biggio Battista
Grosse Kathrin
Lee Taesung
Molloy Ian
Park Youngja
Publication venue
Publication date: 02/11/2021
Field of study

Backdoor attacks mislead machine-learning models to output an attacker-specified class when presented a specific trigger at test time. These attacks require poisoning the training data to compromise the learning algorithm, e.g., by injecting poisoning samples containing the trigger into the training set, along with the desired class label. Despite the increasing number of studies on backdoor attacks and defenses, the underlying factors affecting the success of backdoor attacks, along with their impact on the learning algorithm, are not yet well understood. In this work, we aim to shed light on this issue by unveiling that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as \textit{backdoor smoothing}. To quantify backdoor smoothing, we define a measure that evaluates the uncertainty associated to the predictions of a classifier around the input samples. Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks. We also provide preliminary evidence that backdoor triggers are not the only smoothing-inducing patterns, but that also other artificial patterns can be detected by our approach, paving the way towards understanding the limitations of current defenses and designing novel ones.Comment: 9 pages, 7 figures, under submissio

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Cagliari

Representation mitosis in wide neural networks

Author: Doimo Diego
Glielmo Aldo
Goldt Sebastian
Laio Alessandro
Publication venue
Publication date: 01/01/2021
Field of study

Deep neural networks (DNNs) defy the classical bias-variance trade-off: adding parameters to a DNN that interpolates its training data will typically improve its generalization performance. Explaining the mechanism behind this ``benign overfitting'' in deep networks remains an outstanding challenge. Here, we study the last hidden layer representations of various state-of-the-art convolutional neural networks and find evidence for an underlying mechanism that we call "representation mitosis": if the last hidden representation is wide enough, its neurons tend to split into groups which carry identical information, and differ from each other only by a statistically independent noise. Like in a mitosis process, the number of such groups, or ``clones'', increases linearly with the width of the layer, but only if the width is above a critical value. We show that a key ingredient to activate mitosis is continuing the training process until the training error is zero

arXiv.org e-Print Archive

Sissa Digital Library

Sharpness-Aware Minimization Leads to Low-Rank Features

Author: Andriushchenko Maksym
Bahri Dara
Flammarion Nicolas
Mobahi Hossein
Publication venue
Publication date: 28/10/2023
Field of study

Sharpness-aware minimization (SAM) is a recently proposed method that minimizes the sharpness of the training loss of a neural network. While its generalization improvement is well-known and is the primary motivation, we uncover an additional intriguing effect of SAM: reduction of the feature rank which happens at different layers of a neural network. We show that this low-rank effect occurs very broadly: for different architectures such as fully-connected networks, convolutional networks, vision transformers and for different objectives such as regression, classification, language-image contrastive training. To better understand this phenomenon, we provide a mechanistic understanding of how low-rank features arise in a simple two-layer network. We observe that a significant number of activations gets entirely pruned by SAM which directly contributes to the rank reduction. We confirm this effect theoretically and check that it can also occur in deep networks, although the overall rank reduction mechanism can be more complex, especially for deep networks with pre-activation skip connections and self-attention layers. We make our code available at https://github.com/tml-epfl/sam-low-rank-features.Comment: The camera-ready version (NeurIPS 2023

arXiv.org e-Print Archive

Consensus-Based Optimization on Hypersurfaces: Well-Posedness and Mean-Field Limit

Author: Fornasier Massimo
Huang Hui
Pareschi Lorenzo
Sünnen Philippe
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2020
Field of study

We introduce a new stochastic differential model for global optimization of nonconvex functions on compact hypersurfaces. The model is inspired by the stochastic Kuramoto-Vicsek system and belongs to the class of Consensus-Based Optimization methods. In fact, particles move on the hypersurface driven by a drift towards an instantaneous consensus point, computed as a convex combination of the particle locations weighted by the cost function according to Laplace's principle. The consensus point represents an approximation to a global minimizer. The dynamics is further perturbed by a random vector field to favor exploration, whose variance is a function of the distance of the particles to the consensus point. In particular, as soon as the consensus is reached, then the stochastic component vanishes. In this paper, we study the well-posedness of the model and we derive rigorously its mean-field approximation for large particle limit

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Ferrara

What can linear interpolation of neural network loss landscapes tell us?

Author: Frankle Jonathan
Vlaar Tiffany
Publication venue
Publication date: 30/06/2021
Field of study

Studying neural network loss landscapes provides insights into the nature of the underlying optimization problems. Unfortunately, loss landscapes are notoriously difficult to visualize in a human-comprehensible fashion. One common way to address this problem is to plot linear slices of the landscape, for example from the initial state of the network to the final state after optimization. On the basis of this analysis, prior work has drawn broader conclusions about the difficulty of the optimization problem. In this paper, we put inferences of this kind to the test, systematically evaluating how linear interpolation and final performance vary when altering the data, choice of initialization, and other optimizer and architecture design choices. Further, we use linear interpolation to study the role played by individual layers and substructures of the network. We find that certain layers are more sensitive to the choice of initialization and optimizer hyperparameter settings, and we exploit these observations to design custom optimization schemes. However, our results cast doubt on the broader intuition that the presence or absence of barriers when interpolating necessarily relates to the success of optimization

arXiv.org e-Print Archive

Enlighten

Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Author: Mannelli Stefano Sarao
Vanden-Eijnden Eric
Zdeborová Lenka
Publication venue
Publication date: 18/08/2020
Field of study

We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the over-parametrized regime where the layer width

m

is larger than the input dimension

d

. We consider a teacher-student scenario where the teacher has the same structure as the student with a hidden layer of smaller width

m^*\le m

. We describe how the empirical loss landscape is affected by the number

n

of data samples and the width

m^*

of the teacher network. In particular we determine how the probability that there be no spurious minima on the empirical loss depends on

n

d

, and

m^*

, thereby establishing conditions under which the neural network can in principle recover the teacher. We also show that under the same conditions gradient descent dynamics on the empirical loss converges and leads to small generalization error, i.e. it enables recovery in practice. Finally we characterize the time-convergence rate of gradient descent in the limit of a large number of samples. These results are confirmed by numerical experiments.Comment: 10 pages, 4 figures + appendi

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

UCL Discovery