Search CORE

312 research outputs found

Musings on Deep Learning: Properties of SGD

Author: Golowich Noah
Liao Qianli
Miranda Brando
Poggio Tomaso
Rakhlin Alexander
Sridharan Karthik
Zhang Chiyuan
Publication venue: Center for Brains, Minds and Machines (CBMM)
Publication date: 04/04/2017
Field of study

[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that Stochastic Gradient Descent (SGD) selects with high probability solutions that 1) have zero (or small) empirical error, 2) are degenerate as shown in Theory II and 3) have maximum generalization.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216. H.M. is supported in part by ARO Grant W911NF-15-1- 0385

DSpace@MIT

Exploring the possibilities of obtaining CNN-quality classification models without using convolutional neural networks

Author: Arfan Amir Inaamullah
Publication venue: Norwegian University of Life Sciences
Publication date: 01/01/2022
Field of study

In this thesis, we pursue the success of Convolutional Neural Networks for image classification tasks. We explore the possibilities of achieving state-of-the-art performance without explicitly using CNNs on 2D grayscale images. We propose a Binary Patch Convolution (BPC) framework based on binarized patches from each group of images in a supervised task, eliminating the kernel learning process of CNNs. The binarized patches act as activations of different shapes and are applied using convolution. One of the key aspects of the framework is that it maintains a direct relation between the convolution kernels and the original images. Therefore, we can present a method to measure information content in a feature map for observing relations between different groups. We discuss and test different strategies for selecting groups of images to extract patches from while evaluating their effect on classification accuracy. The practical implementation of the BPC framework allows for many convolution kernels to be evaluated, positively impacting the framework’s performance. Ultimately, the proposed framework can extract pertinent features for classification and can be combined with any classifier. The framework is tested on the MNIST and Fashion-MNIST datasets and achieves competitive accuracy, even outperforming related work. We also discuss challenges and future work applicable to the framework. Furthermore, we have attempted to capture trends in the error of images by proposing an iterative variant of singular value bases classification. The proposed method fails to capture a generalizable error trend; thus, we have recognized that it is a challenging task for images. The process has given valuable insight into how to approach image classification problems. On top of that, we have examined the effects of negative transfer inherent in an original problem. Our experiments show that models trained on all groups in the data (global) are outperformed by models trained on different combinations of subgroups (local). Our proposed approaches for minimizing negative transfer within a task effectively increase classification accuracy. However, they are infeasible to deploy in practical scenarios due to the computation time introduced. The results are meant to motivate research toward within-task minimization of negative transfer, primarily since the existing research is focused on doing so in transfer learning

Brage NMBU

Identifying overparameterization in Quantum Circuit Born Machines

Author: Delgado Andrea
Hamilton Kathleen E.
Rios Francisco
Publication venue
Publication date: 10/07/2023
Field of study

In machine learning, overparameterization is associated with qualitative changes in the empirical risk landscape, which can lead to more efficient training dynamics. For many parameterized models used in statistical learning, there exists a critical number of parameters, or model size, above which the model is constructed and trained in the overparameterized regime. There are many characteristics of overparameterized loss landscapes. The most significant is the convergence of standard gradient descent to global or local minima of low loss. In this work, we study the onset of overparameterization transitions for quantum circuit Born machines, generative models that are trained using non-adversarial gradient-based methods. We observe that bounds based on numerical analysis are in general good lower bounds on the overparameterization transition. However, bounds based on the quantum circuit's algebraic structure are very loose upper bounds. Our results indicate that fully understanding the trainability of these models remains an open question.Comment: 11 pages, 16 figure

arXiv.org e-Print Archive

Global Convergence of SGD On Two Layer Neural Nets

Author: Gopalani Pulkit
Mukherjee Anirbit
Publication venue
Publication date: 08/04/2023
Field of study

In this note we demonstrate provable convergence of SGD to the global minima of appropriately regularized

\ell_2-

empirical risk of depth

2

nets -- for arbitrary data and with any number of gates, if they are using adequately smooth and bounded activations like sigmoid and tanh. We build on the results in [1] and leverage a constant amount of Frobenius norm regularization on the weights, along with sampling of the initial weights from an appropriate distribution. We also give a continuous time SGD convergence result that also applies to smooth unbounded activations like SoftPlus. Our key idea is to show the existence loss functions on constant sized neural nets which are "Villani Functions". [1] Bin Shi, Weijie J. Su, and Michael I. Jordan. On learning rates and schr\"odinger operators, 2020. arXiv:2004.06977Comment: 23 pages, 6 figures. Extended abstract accepted at DeepMath 2022. v2 update: New experiments added in Section 3.2 to study the effect of the regularization value. Statement of Theorem 3.4 about SoftPlus nets has been improve

arXiv.org e-Print Archive