Search CORE

106 research outputs found

Generalization Error Bounds of Gradient Descent for Learning Over-parameterized Deep ReLU Networks

Author: Cao Yuan
Gu Quanquan
Publication venue
Publication date: 27/11/2019
Field of study

Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very good generalization performance in the over-parameterization regime, where DNNs can easily fit a random labeling of the training data. Very recently, a line of work explains in theory that with over-parameterization and proper random initialization, gradient-based methods can find the global minima of the training loss for DNNs. However, existing generalization error bounds are unable to explain the good generalization performance of over-parameterized DNNs. The major limitation of most existing generalization bounds is that they are based on uniform convergence and are independent of the training algorithm. In this work, we derive an algorithm-dependent generalization error bound for deep ReLU networks, and show that under certain assumptions on the data distribution, gradient descent (GD) with proper random initialization is able to train a sufficiently over-parameterized DNN to achieve arbitrarily small generalization error. Our work sheds light on explaining the good generalization performance of over-parameterized deep neural networks.Comment: 27 pages. This version simplifies the proof and improves the presentation in Version 3. In AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Learning Sparse Neural Networks via Sensitivity-Driven Regularization

Author: Fiandrotti Attilio
Francini Gianluca
Lepsøy Skjalg
Tartaglione Enzo
Publication venue
Publication date: 01/01/2018
Field of study

The ever-increasing number of parameters in deep neural networks poses challenges for memory-limited applications. Regularize-and-prune methods aim at meeting these challenges by sparsifying the network weights. In this context we quantify the output sensitivity to the parameters (i.e. their relevance to the network output) and introduce a regularization term that gradually lowers the absolute value of parameters with low sensitivity. Thus, a very large fraction of the parameters approach zero and are eventually set to zero by simple thresholding. Our method surpasses most of the recent techniques both in terms of sparsity and error rates. In some cases, the method reaches twice the sparsity obtained by other techniques at equal error rates

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

On the Role of Structured Pruning for Neural Network Compression

Author: Bragagnolo Andrea
Fiandrotti Attilio
Grangetto Marco
Tartaglione Enzo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

International audienc

HAL Descartes

Institutional Research Information System University of Turin

Take a Ramble into Solution Spaces for Classification Problems in Neural Networks

Author: C Baldassi
J Li
L Bottou
Y Chen
Y Nesterov
Y Tarabalka
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

International audienc

Crossref

HAL Descartes

Institutional Research Information System University of Turin