312 research outputs found

    Musings on Deep Learning: Properties of SGD

    Get PDF
    [previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that Stochastic Gradient Descent (SGD) selects with high probability solutions that 1) have zero (or small) empirical error, 2) are degenerate as shown in Theory II and 3) have maximum generalization.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216. H.M. is supported in part by ARO Grant W911NF-15-1- 0385

    Exploring the possibilities of obtaining CNN-quality classification models without using convolutional neural networks

    Get PDF
    In this thesis, we pursue the success of Convolutional Neural Networks for image classification tasks. We explore the possibilities of achieving state-of-the-art performance without explicitly using CNNs on 2D grayscale images. We propose a Binary Patch Convolution (BPC) framework based on binarized patches from each group of images in a supervised task, eliminating the kernel learning process of CNNs. The binarized patches act as activations of different shapes and are applied using convolution. One of the key aspects of the framework is that it maintains a direct relation between the convolution kernels and the original images. Therefore, we can present a method to measure information content in a feature map for observing relations between different groups. We discuss and test different strategies for selecting groups of images to extract patches from while evaluating their effect on classification accuracy. The practical implementation of the BPC framework allows for many convolution kernels to be evaluated, positively impacting the framework’s performance. Ultimately, the proposed framework can extract pertinent features for classification and can be combined with any classifier. The framework is tested on the MNIST and Fashion-MNIST datasets and achieves competitive accuracy, even outperforming related work. We also discuss challenges and future work applicable to the framework. Furthermore, we have attempted to capture trends in the error of images by proposing an iterative variant of singular value bases classification. The proposed method fails to capture a generalizable error trend; thus, we have recognized that it is a challenging task for images. The process has given valuable insight into how to approach image classification problems. On top of that, we have examined the effects of negative transfer inherent in an original problem. Our experiments show that models trained on all groups in the data (global) are outperformed by models trained on different combinations of subgroups (local). Our proposed approaches for minimizing negative transfer within a task effectively increase classification accuracy. However, they are infeasible to deploy in practical scenarios due to the computation time introduced. The results are meant to motivate research toward within-task minimization of negative transfer, primarily since the existing research is focused on doing so in transfer learning

    Identifying overparameterization in Quantum Circuit Born Machines

    Full text link
    In machine learning, overparameterization is associated with qualitative changes in the empirical risk landscape, which can lead to more efficient training dynamics. For many parameterized models used in statistical learning, there exists a critical number of parameters, or model size, above which the model is constructed and trained in the overparameterized regime. There are many characteristics of overparameterized loss landscapes. The most significant is the convergence of standard gradient descent to global or local minima of low loss. In this work, we study the onset of overparameterization transitions for quantum circuit Born machines, generative models that are trained using non-adversarial gradient-based methods. We observe that bounds based on numerical analysis are in general good lower bounds on the overparameterization transition. However, bounds based on the quantum circuit's algebraic structure are very loose upper bounds. Our results indicate that fully understanding the trainability of these models remains an open question.Comment: 11 pages, 16 figure

    Global Convergence of SGD On Two Layer Neural Nets

    Full text link
    In this note we demonstrate provable convergence of SGD to the global minima of appropriately regularized ℓ2−\ell_2-empirical risk of depth 22 nets -- for arbitrary data and with any number of gates, if they are using adequately smooth and bounded activations like sigmoid and tanh. We build on the results in [1] and leverage a constant amount of Frobenius norm regularization on the weights, along with sampling of the initial weights from an appropriate distribution. We also give a continuous time SGD convergence result that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence loss functions on constant sized neural nets which are "Villani Functions". [1] Bin Shi, Weijie J. Su, and Michael I. Jordan. On learning rates and schr\"odinger operators, 2020. arXiv:2004.06977Comment: 23 pages, 6 figures. Extended abstract accepted at DeepMath 2022. v2 update: New experiments added in Section 3.2 to study the effect of the regularization value. Statement of Theorem 3.4 about SoftPlus nets has been improve
    • …
    corecore