Search CORE

53 research outputs found

Generalization in Deep Learning

Author: Bengio Yoshua
Kaelbling Leslie Pack
Kawaguchi Kenji
Publication venue
Publication date: 27/07/2020
Field of study

This paper provides theoretical insights into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, nonrobustness, and sharp minima, responding to an open question in the literature. We also discuss approaches to provide non-vacuous generalization guarantees for deep learning. Based on theoretical observations, we propose new open problems and discuss the limitations of our results.Comment: To appear in Mathematics of Deep Learning, Cambridge University Press. All previous results remain unchange

arXiv.org e-Print Archive

ScholarBank@NUS

SciTech News Volume 71, No. 2 (2017)

Author
Publication venue: Jefferson Digital Commons
Publication date: 01/06/2017
Field of study

Columns and Reports From the Editor 3 Division News Science-Technology Division 5 Chemistry Division 8 Engineering Division 9 Aerospace Section of the Engineering Division 12 Architecture, Building Engineering, Construction and Design Section of the Engineering Division 14 Reviews Sci-Tech Book News Reviews 16 Advertisements IEEE

Jefferson Digital Commons

Recommended from our members

Algebraic Statistics

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2017
Field of study

Algebraic Statistics is concerned with the interplay of techniques from commutative algebra, combinatorics, (real) algebraic geometry, and related fields with problems arising in statistics and data science. This workshop was the first at Oberwolfach dedicated to this emerging subject area. The participants highlighted recent achievements in this field, explored exciting new applications, and mapped out future directions for research

Repositorium für Naturwissenschaften und Technik

Generalization in Deep Learning

Author: Bengio Yoshua
Kaelbling Leslie Pack
Kawaguchi Kenji
Publication venue
Publication date: 09/05/2018
Field of study

With a direct analysis of neural networks, this paper presents a mathematically tight generalization theory to partially address an open problem regarding the generalization of deep learning. Unlike previous bound-based theory, our main theory is quantitatively as tight as possible for every dataset individually, while producing qualitative insights competitively. Our results give insight into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, nonrobustness, and sharp minima, answering to an open question in the literature. We also discuss limitations of our results and propose additional open problems

DSpace@MIT

The Vessel Schedule Recovery Problem:Disruption management in liner shipping

Author: Brouer Berit Dangaard
Dirksen Jakob
Pisinger David
Plum Christian Edinger Munk
Vaaben Bo
Publication venue
Publication date: 01/01/2012
Field of study

Online Research Database In Technology

IST Austria Thesis

Author: Bui Thi Mai Phuong
Publication venue: IST Austria
Publication date: 01/01/2021
Field of study

Deep learning is best known for its empirical success across a wide range of applications spanning computer vision, natural language processing and speech. Of equal significance, though perhaps less known, are its ramifications for learning theory: deep networks have been observed to perform surprisingly well in the high-capacity regime, aka the overfitting or underspecified regime. Classically, this regime on the far right of the bias-variance curve is associated with poor generalisation; however, recent experiments with deep networks challenge this view. This thesis is devoted to investigating various aspects of underspecification in deep learning. First, we argue that deep learning models are underspecified on two levels: a) any given training dataset can be fit by many different functions, and b) any given function can be expressed by many different parameter configurations. We refer to the second kind of underspecification as parameterisation redundancy and we precisely characterise its extent. Second, we characterise the implicit criteria (the inductive bias) that guide learning in the underspecified regime. Specifically, we consider a nonlinear but tractable classification setting, and show that given the choice, neural networks learn classifiers with a large margin. Third, we consider learning scenarios where the inductive bias is not by itself sufficient to deal with underspecification. We then study different ways of ‘tightening the specification’: i) In the setting of representation learning with variational autoencoders, we propose a hand- crafted regulariser based on mutual information. ii) In the setting of binary classification, we consider soft-label (real-valued) supervision. We derive a generalisation bound for linear networks supervised in this way and verify that soft labels facilitate fast learning. Finally, we explore an application of soft-label supervision to the training of multi-exit models

IST Austria: PubRep (Institute of Science and Technology)

Some phenomenological investigations in deep learning

Author: Baratin Aristide
Publication venue
Publication date: 01/12/2021
Field of study

Les remarquables performances des réseaux de neurones profonds dans de nombreux domaines de l'apprentissage automatique au cours de la dernière décennie soulèvent un certain nombre de questions théoriques. Par exemple, quels mecanismes permettent à ces reseaux, qui ont largement la capacité de mémoriser entièrement les exemples d'entrainement, de généraliser correctement à de nouvelles données, même en l'absence de régularisation explicite ? De telles questions ont fait l'objet d'intenses efforts de recherche ces dernières années, combinant analyses de systèmes simplifiés et études empiriques de propriétés qui semblent être corrélées à la performance de généralisation. Les deux premiers articles présentés dans cette thèse contribuent à cette ligne de recherche. Leur but est de mettre en évidence et d'etudier des mécanismes de biais implicites permettant à de larges modèles de prioriser l'apprentissage de fonctions "simples" et d'adapter leur capacité à la complexité du problème. Le troisième article aborde le problème de l'estimation de information mutuelle en haute, en mettant à profit l'expressivité et la scalabilité des reseaux de neurones profonds. Il introduit et étudie une nouvelle classe d'estimateurs, dont il présente plusieurs applications en apprentissage non supervisé, notamment à l'amélioration des modèles neuronaux génératifs.The striking empirical success of deep neural networks in machine learning raises a number of theoretical puzzles. For example, why can they generalize to unseen data despite their capacity to fully memorize the training examples? Such puzzles have been the subject of intense research efforts in the past few years, which combine rigorous analysis of simplified systems with empirical studies of phenomenological properties shown to correlate with generalization. The first two articles presented in these thesis contribute to this line of work. They highlight and discuss mechanisms that allow large models to prioritize learning `simple' functions during training and to adapt their capacity to the complexity of the problem. The third article of this thesis addresses the long standing problem of estimating mutual information in high dimension, by leveraging the scalability of neural networks. It introduces and studies a new class of estimators and present several applications in unsupervised learning, especially on enhancing generative models

Dépôt Institutionnel Numérique