1,029 research outputs found
Size and depth of monotone neural networks: interpolation and approximation
Monotone functions and data sets arise in a variety of applications. We study
the interpolation problem for monotone data sets: The input is a monotone data
set with points, and the goal is to find a size and depth efficient
monotone neural network, with non negative parameters and threshold units, that
interpolates the data set. We show that there are monotone data sets that
cannot be interpolated by a monotone network of depth . On the other hand,
we prove that for every monotone data set with points in ,
there exists an interpolating monotone network of depth and size .
Our interpolation result implies that every monotone function over
can be approximated arbitrarily well by a depth-4 monotone network, improving
the previous best-known construction of depth . Finally, building on
results from Boolean circuit complexity, we show that the inductive bias of
having positive parameters can lead to a super-polynomial blow-up in the number
of neurons when approximating monotone functions.Comment: 19 page
Towards Understanding Hierarchical Learning: Benefits of Neural Representations
Deep neural networks can empirically perform efficient hierarchical learning,
in which the layers learn useful representations of the data. However, how they
make use of the intermediate representations are not explained by recent
theories that relate them to "shallow learners" such as kernels. In this work,
we demonstrate that intermediate neural representations add more flexibility to
neural networks and can be advantageous over raw inputs. We consider a fixed,
randomly initialized neural network as a representation function fed into
another trainable network. When the trainable network is the quadratic Taylor
model of a wide two-layer network, we show that neural representation can
achieve improved sample complexities compared with the raw input: For learning
a low-rank degree- polynomial () in dimension, neural
representation requires only samples, while
the best-known sample complexity upper bound for the raw input is
. We contrast our result with a lower bound showing that
neural representations do not improve over the raw input (in the infinite width
limit), when the trainable network is instead a neural tangent kernel. Our
results characterize when neural representations are beneficial, and may
provide a new perspective on why depth is important in deep learning.Comment: 41 pages, published in NeurIPS 202
Feature-Learning Networks Are Consistent Across Widths At Realistic Scales
We study the effect of width on the dynamics of feature-learning neural
networks across a variety of architectures and datasets. Early in training,
wide neural networks trained on online data have not only identical loss curves
but also agree in their point-wise test predictions throughout training. For
simple tasks such as CIFAR-5m this holds throughout training for networks of
realistic widths. We also show that structural properties of the models,
including internal representations, preactivation distributions, edge of
stability phenomena, and large learning rate effects are consistent across
large widths. This motivates the hypothesis that phenomena seen in realistic
models can be captured by infinite-width, feature-learning limits. For harder
tasks (such as ImageNet and language modeling), and later training times,
finite-width deviations grow systematically. Two distinct effects cause these
deviations across widths. First, the network output has
initialization-dependent variance scaling inversely with width, which can be
removed by ensembling networks. We observe, however, that ensembles of narrower
networks perform worse than a single wide network. We call this the bias of
narrower width. We conclude with a spectral perspective on the origin of this
finite-width bias
DEEP LEARNING BASED SEGMENTATION AND CLASSIFICATION FOR IMPROVED BREAST CANCER DETECTION
Breast Cancer is a leading killer of women globally. It is a serious health concern caused by calcifications or abnormal tissue growth in the breast. Doing a screening and identifying the nature of the tumor as benign or malignant is important to facilitate early intervention, which drastically decreases the mortality rate. Usually, it uses ultrasound images, since they are easily accessible to most people and have no drawbacks as such, unlike in the other most famous screening technique of mammograms where in some cases you may not get a clear scan. In this thesis, the approach to this problem is to build a stacked model which makes predictions on the basis of the shape, pattern, and spread of the tumor. To achieve this, typical steps are pre-processing of images followed by segmentation of the image and classification. For pre-processing, the proposed approach in this thesis uses histogram equalization that helps in improving the contrast of the image, making the tumor stand out from its surroundings, and making it easier for the segmentation step. Through segmentation, the approach uses UNet architecture with a ResNet backbone. The UNet architecture is made specifically for biomedical imaging. The aim of segmentation is to separate the tumor from the ultrasound image so that the classification model can make its predictions from this mask. The metric result of the F1-score for the segmentation model turned out to be 97.30%. For classification, the CNN base model is used for feature extraction from provided masks. These are then fed into a network and the predictions are done. The base CNN model used is ResNet50 and the neural network used for the output layer is a simple 8-layer network with ReLU activation in the hidden layers and softmax in the final decision-making layer. The ResNet weights are initialized from training on ImageNet. The ResNet50 returns 2048 features from each mask. These are then fed into the network for decision-making. The hidden layers of the neural network have 1024, 512, 256, 128, 64, 32, and 10 neurons respectively. The classification accuracy achieved for the proposed model was 98.61% with an F1 score of 98.41%. The detailed experimental results are presented along with comparative data
Provable Guarantees for Neural Networks via Gradient Feature Learning
Neural networks have achieved remarkable empirical performance, while the
current theoretical analysis is not adequate for understanding their success,
e.g., the Neural Tangent Kernel approach fails to capture their key feature
learning ability, while recent analyses on feature learning are typically
problem-specific. This work proposes a unified analysis framework for two-layer
networks trained by gradient descent. The framework is centered around the
principle of feature learning from gradients, and its effectiveness is
demonstrated by applications in several prototypical problems, such as mixtures
of Gaussians and parity functions. The framework also sheds light on
interesting network learning phenomena such as feature learning beyond kernels
and the lottery ticket hypothesis.Comment: NeurIPS 2023, 71 page
- …