Search CORE

9 research outputs found

Flat Minima in Linear Estimation and an Extended Gauss Markov Theorem

Author: Segert Simon
Publication venue
Publication date: 18/11/2023
Field of study

We consider the problem of linear estimation, and establish an extension of the Gauss-Markov theorem, in which the bias operator is allowed to be non-zero but bounded with respect to a matrix norm of Schatten type. We derive simple and explicit formulas for the optimal estimator in the cases of Nuclear and Spectral norms (with the Frobenius case recovering ridge regression). Additionally, we analytically derive the generalization error in multiple random matrix ensembles, and compare with Ridge regression. Finally, we conduct an extensive simulation study, in which we show that the cross-validated Nuclear and Spectral regressors can outperform Ridge in several circumstances

arXiv.org e-Print Archive

Feature Learning and Signal Propagation in Deep Neural Networks

Author: Hayou Soufiane
Lou Yizhang
Mingard Chris
Nam Yoonsoo
Publication venue
Publication date: 22/05/2022
Field of study

Recent work by Baratin et al. (2021) sheds light on an intriguing pattern that occurs during the training of deep neural networks: some layers align much more with data compared to other layers (where the alignment is defined as the euclidean product of the tangent features matrix and the data labels matrix). The curve of the alignment as a function of layer index (generally) exhibits an ascent-descent pattern where the maximum is reached for some hidden layer. In this work, we provide the first explanation for this phenomenon. We introduce the Equilibrium Hypothesis which connects this alignment pattern to signal propagation in deep neural networks. Our experiments demonstrate an excellent match with the theoretical predictions.Comment: 35 page

arXiv.org e-Print Archive

Almost Sure Convergence of Dropout Algorithms for Neural Networks

Author: Sanders Jaron
Senen-Cerda Albert
Publication venue
Publication date: 06/02/2020
Field of study

We investigate the convergence and convergence rate of stochastic training algorithms for Neural Networks (NNs) that, over the years, have spawned from Dropout (Hinton et al., 2012). Modeling that neurons in the brain may not fire, dropout algorithms consist in practice of multiplying the weight matrices of a NN component-wise by independently drawn random matrices with

\{0,1\}

-valued entries during each iteration of the Feedforward-Backpropagation algorithm. This paper presents a probability theoretical proof that for any NN topology and differentiable polynomially bounded activation functions, if we project the NN's weights into a compact set and use a dropout algorithm, then the weights converge to a unique stationary set of a projected system of Ordinary Differential Equations (ODEs). We also establish an upper bound on the rate of convergence of Gradient Descent (GD) on the limiting ODEs of dropout algorithms for arborescences (a class of trees) of arbitrary depth and with linear activation functions.Comment: 20 pages, 2 figure

arXiv.org e-Print Archive

Asymptotics of stochastic learning in structured networks

Author: Senén Cerdà Albert
Publication venue: Eindhoven University of Technology
Publication date: 15/05/2023
Field of study

Pure OAI Repository

Asymptotics of stochastic learning in structured networks

Author: Senén Cerdà Albert
Publication venue: Eindhoven University of Technology
Publication date: 15/05/2023
Field of study

Pure OAI Repository