35,879 research outputs found
Neural Rank Collapse: Weight Decay and Small Within-Class Variability Yield Low-Rank Bias
Recent work in deep learning has shown strong empirical and theoretical
evidence of an implicit low-rank bias: weight matrices in deep networks tend to
be approximately low-rank and removing relatively small singular values during
training or from available trained models may significantly reduce model size
while maintaining or even improving model performance. However, the majority of
the theoretical investigations around low-rank bias in neural networks deal
with oversimplified deep linear networks. In this work, we consider general
networks with nonlinear activations and the weight decay parameter, and we show
the presence of an intriguing neural rank collapse phenomenon, connecting the
low-rank bias of trained networks with networks' neural collapse properties: as
the weight decay parameter grows, the rank of each layer in the network
decreases proportionally to the within-class variability of the hidden-space
embeddings of the previous layers. Our theoretical findings are supported by a
range of experimental evaluations illustrating the phenomenon
Neural Collapse: A Review on Modelling Principles and Generalization
Deep classifier neural networks enter the terminal phase of training (TPT)
when training error reaches zero and tend to exhibit intriguing Neural Collapse
(NC) properties. Neural collapse essentially represents a state at which the
within-class variability of final hidden layer outputs is infinitesimally small
and their class means form a simplex equiangular tight frame. This simplifies
the last layer behaviour to that of a nearest-class center decision rule.
Despite the simplicity of this state, the dynamics and implications of reaching
it are yet to be fully understood. In this work, we review the principles which
aid in modelling neural collapse, followed by the implications of this state on
generalization and transfer learning capabilities of neural networks. Finally,
we conclude by discussing potential avenues and directions for future research.Comment: Transactions on Machine Learning Research (TMLR), 202
Deep Randomized Neural Networks
Randomized Neural Networks explore the behavior of neural systems where the majority of connections are fixed, either in a stochastic or a deterministic fashion. Typical examples of such systems consist of multi-layered neural network architectures where the connections to the hidden layer(s) are left untrained after initialization. Limiting the training algorithms to operate on a reduced set of weights inherently characterizes the class of Randomized Neural Networks with a number of intriguing features. Among them, the extreme efficiency of the resulting learning processes is undoubtedly a striking advantage with respect to fully trained architectures. Besides, despite the involved simplifications, randomized neural systems possess remarkable properties both in practice, achieving state-of-the-art results in multiple domains, and theoretically, allowing to analyze intrinsic properties of neural architectures (e.g. before training of the hidden layers’ connections). In recent years, the study of Randomized Neural Networks has been extended towards deep architectures, opening new research directions to the design of effective yet extremely efficient deep learning models in vectorial as well as in more complex data domains. This chapter surveys all the major aspects regarding the design and analysis of Randomized Neural Networks, and some of the key results with respect to their approximation capabilities. In particular, we first introduce the fundamentals of randomized neural models in the context of feed-forward networks (i.e., Random Vector Functional Link and equivalent models) and convolutional filters, before moving to the case of recurrent systems (i.e., Reservoir Computing networks). For both, we focus specifically on recent results in the domain of deep randomized systems, and (for recurrent models) their application to structured domains
Deep Randomized Neural Networks
Randomized Neural Networks explore the behavior of neural systems where the
majority of connections are fixed, either in a stochastic or a deterministic
fashion. Typical examples of such systems consist of multi-layered neural
network architectures where the connections to the hidden layer(s) are left
untrained after initialization. Limiting the training algorithms to operate on
a reduced set of weights inherently characterizes the class of Randomized
Neural Networks with a number of intriguing features. Among them, the extreme
efficiency of the resulting learning processes is undoubtedly a striking
advantage with respect to fully trained architectures. Besides, despite the
involved simplifications, randomized neural systems possess remarkable
properties both in practice, achieving state-of-the-art results in multiple
domains, and theoretically, allowing to analyze intrinsic properties of neural
architectures (e.g. before training of the hidden layers' connections). In
recent years, the study of Randomized Neural Networks has been extended towards
deep architectures, opening new research directions to the design of effective
yet extremely efficient deep learning models in vectorial as well as in more
complex data domains. This chapter surveys all the major aspects regarding the
design and analysis of Randomized Neural Networks, and some of the key results
with respect to their approximation capabilities. In particular, we first
introduce the fundamentals of randomized neural models in the context of
feed-forward networks (i.e., Random Vector Functional Link and equivalent
models) and convolutional filters, before moving to the case of recurrent
systems (i.e., Reservoir Computing networks). For both, we focus specifically
on recent results in the domain of deep randomized systems, and (for recurrent
models) their application to structured domains
- …