312 research outputs found
Musings on Deep Learning: Properties of SGD
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that Stochastic Gradient Descent (SGD) selects with high probability solutions that 1) have zero (or small) empirical error, 2) are degenerate as shown in Theory II and 3) have maximum generalization.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216. H.M. is supported in part by ARO Grant W911NF-15-1- 0385
Exploring the possibilities of obtaining CNN-quality classification models without using convolutional neural networks
In this thesis, we pursue the success of Convolutional Neural Networks for image classification tasks. We explore the possibilities of achieving state-of-the-art performance without explicitly using CNNs on 2D grayscale images.
We propose a Binary Patch Convolution (BPC) framework based on binarized patches from each group of images in a supervised task, eliminating the kernel learning process of CNNs. The binarized patches act as activations of different shapes and are applied using convolution. One of the key aspects of the framework is that it maintains a direct relation between the convolution kernels and the original images. Therefore, we can present a method to measure information content in a feature map for observing relations between different groups. We discuss and test different strategies for selecting groups of images to extract patches from while evaluating their effect on classification accuracy. The practical implementation of the BPC framework allows for many convolution kernels to be evaluated, positively impacting the framework’s performance. Ultimately, the proposed framework can extract pertinent features for classification and can be combined with any classifier. The framework is tested on the MNIST and Fashion-MNIST datasets and achieves competitive accuracy, even outperforming related work. We also discuss challenges and future work applicable to the framework.
Furthermore, we have attempted to capture trends in the error of images by proposing an iterative variant of singular value bases classification. The proposed method fails to capture a generalizable error trend; thus, we have recognized that it is a challenging task for images. The process has given valuable insight into how to approach image classification problems.
On top of that, we have examined the effects of negative transfer inherent in an original problem. Our experiments show that models trained on all groups in the data (global) are outperformed by models trained on different combinations of subgroups (local). Our proposed approaches for minimizing negative transfer within a task effectively increase classification accuracy. However, they are infeasible to deploy in practical scenarios due to the computation time introduced. The results are meant to motivate research toward within-task minimization of negative transfer, primarily since the existing research is focused on doing so in transfer learning
Identifying overparameterization in Quantum Circuit Born Machines
In machine learning, overparameterization is associated with qualitative
changes in the empirical risk landscape, which can lead to more efficient
training dynamics. For many parameterized models used in statistical learning,
there exists a critical number of parameters, or model size, above which the
model is constructed and trained in the overparameterized regime. There are
many characteristics of overparameterized loss landscapes. The most significant
is the convergence of standard gradient descent to global or local minima of
low loss. In this work, we study the onset of overparameterization transitions
for quantum circuit Born machines, generative models that are trained using
non-adversarial gradient-based methods. We observe that bounds based on
numerical analysis are in general good lower bounds on the overparameterization
transition. However, bounds based on the quantum circuit's algebraic structure
are very loose upper bounds. Our results indicate that fully understanding the
trainability of these models remains an open question.Comment: 11 pages, 16 figure
Global Convergence of SGD On Two Layer Neural Nets
In this note we demonstrate provable convergence of SGD to the global minima
of appropriately regularized empirical risk of depth nets -- for
arbitrary data and with any number of gates, if they are using adequately
smooth and bounded activations like sigmoid and tanh. We build on the results
in [1] and leverage a constant amount of Frobenius norm regularization on the
weights, along with sampling of the initial weights from an appropriate
distribution. We also give a continuous time SGD convergence result that also
applies to smooth unbounded activations like SoftPlus. Our key idea is to show
the existence loss functions on constant sized neural nets which are "Villani
Functions". [1] Bin Shi, Weijie J. Su, and Michael I. Jordan. On learning rates
and schr\"odinger operators, 2020. arXiv:2004.06977Comment: 23 pages, 6 figures. Extended abstract accepted at DeepMath 2022. v2
update: New experiments added in Section 3.2 to study the effect of the
regularization value. Statement of Theorem 3.4 about SoftPlus nets has been
improve
- …