40 research outputs found

    Some phenomenological investigations in deep learning

    Full text link
    Les remarquables performances des réseaux de neurones profonds dans de nombreux domaines de l'apprentissage automatique au cours de la dernière décennie soulèvent un certain nombre de questions théoriques. Par exemple, quels mecanismes permettent à ces reseaux, qui ont largement la capacité de mémoriser entièrement les exemples d'entrainement, de généraliser correctement à de nouvelles données, même en l'absence de régularisation explicite ? De telles questions ont fait l'objet d'intenses efforts de recherche ces dernières années, combinant analyses de systèmes simplifiés et études empiriques de propriétés qui semblent être corrélées à la performance de généralisation. Les deux premiers articles présentés dans cette thèse contribuent à cette ligne de recherche. Leur but est de mettre en évidence et d'etudier des mécanismes de biais implicites permettant à de larges modèles de prioriser l'apprentissage de fonctions "simples" et d'adapter leur capacité à la complexité du problème. Le troisième article aborde le problème de l'estimation de information mutuelle en haute, en mettant à profit l'expressivité et la scalabilité des reseaux de neurones profonds. Il introduit et étudie une nouvelle classe d'estimateurs, dont il présente plusieurs applications en apprentissage non supervisé, notamment à l'amélioration des modèles neuronaux génératifs.The striking empirical success of deep neural networks in machine learning raises a number of theoretical puzzles. For example, why can they generalize to unseen data despite their capacity to fully memorize the training examples? Such puzzles have been the subject of intense research efforts in the past few years, which combine rigorous analysis of simplified systems with empirical studies of phenomenological properties shown to correlate with generalization. The first two articles presented in these thesis contribute to this line of work. They highlight and discuss mechanisms that allow large models to prioritize learning `simple' functions during training and to adapt their capacity to the complexity of the problem. The third article of this thesis addresses the long standing problem of estimating mutual information in high dimension, by leveraging the scalability of neural networks. It introduces and studies a new class of estimators and present several applications in unsupervised learning, especially on enhancing generative models

    Group field theory with non-commutative metric variables

    Full text link
    We introduce a dual formulation of group field theories, making them a type of non-commutative field theories. In this formulation, the variables of the field are Lie algebra variables with a clear interpretation in terms of simplicial geometry. For Ooguri-type models, the Feynman amplitudes are simplicial path integrals for BF theories. This formulation suggests ways to impose the simplicity constraints involved in BF formulations of 4d gravity directly at the level of the group field theory action. We illustrate this by giving a new GFT definition of the Barrett-Crane model.Comment: 4 pages; v3 published versio

    Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

    Full text link
    Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called `lazy' regime in which the network can be well approximated by its linearization around initialization. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. In other words, the non-linear dynamics tends to sequentialize the learning of examples of increasing difficulty. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of spurious correlations. Our results reveal a new understanding of how deep networks prioritize resources across example difficulty

    Melonic phase transition in group field theory

    Full text link
    Group field theories have recently been shown to admit a 1/N expansion dominated by so-called `melonic graphs', dual to triangulated spheres. In this note, we deepen the analysis of this melonic sector. We obtain a combinatorial formula for the melonic amplitudes in terms of a graph polynomial related to a higher dimensional generalization of the Kirchhoff tree-matrix theorem. Simple bounds on these amplitudes show the existence of a phase transition driven by melonic interaction processes. We restrict our study to the Boulatov-Ooguri models, which describe topological BF theories and are the basis for the construction of four dimensional models of quantum gravity.Comment: 8 pages, 4 figures; to appear in Letters in Mathematical Physic

    2-Group Representations for Spin Foams

    Get PDF
    Just as 3d state sum models, including 3d quantum gravity, can be built using categories of group representations, "2-categories of 2-group representations" may provide interesting state sum models for 4d quantum topology, if not quantum gravity. Here we focus on the "Euclidean 2-group", built from the rotation group SO(4) and its action on the group of translations of 4d Euclidean space. We explain its infinite-dimensional unitary representations, and construct a model based on the resulting representation 2-category. This model, with clear geometric content and explicit "metric data" on triangulation edges, shows up naturally in an attempt to write the amplitudes of ordinary quantum field theory in a background independent way.Comment: 8 pages; to appear in proceedings of the XXV Max Born Symposium: "The Planck Scale", Wroclaw, Polan

    CrossSplit: Mitigating Label Noise Memorization through Data Splitting

    Full text link
    We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label correction and co-teaching methods, we propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit, which uses a pair of neural networks trained on two disjoint parts of the labelled dataset. CrossSplit combines two main ingredients: (i) Cross-split label correction. The idea is that, since the model trained on one part of the data cannot memorize example-label pairs from the other part, the training labels presented to each network can be smoothly adjusted by using the predictions of its peer network; (ii) Cross-split semi-supervised training. A network trained on one part of the data also uses the unlabeled inputs of the other part. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that our method can outperform the current state-of-the-art in a wide range of noise ratios.Comment: Accepted to ICML 202

    A Modern Take on the Bias-Variance Tradeoff in Neural Networks

    Full text link
    The bias-variance tradeoff tells us that as model complexity increases, bias falls and variances increases, leading to a U-shaped test error curve. However, recent empirical results with over-parameterized neural networks are marked by a striking absence of the classic U-shaped test error curve: test error keeps decreasing in wider networks. This suggests that there might not be a bias-variance tradeoff in neural networks with respect to network width, unlike was originally claimed by, e.g., Geman et al. (1992). Motivated by the shaky evidence used to support this claim in neural networks, we measure bias and variance in the modern setting. We find that both bias and variance can decrease as the number of parameters grows. To better understand this, we introduce a new decomposition of the variance to disentangle the effects of optimization and data sampling. We also provide theoretical analysis in a simplified setting that is consistent with our empirical findings
    corecore