Search CORE

1,029 research outputs found

Size and depth of monotone neural networks: interpolation and approximation

Author: Mikulincer Dan
Reichman Daniel
Publication venue
Publication date: 11/07/2022
Field of study

Monotone functions and data sets arise in a variety of applications. We study the interpolation problem for monotone data sets: The input is a monotone data set with

n

points, and the goal is to find a size and depth efficient monotone neural network, with non negative parameters and threshold units, that interpolates the data set. We show that there are monotone data sets that cannot be interpolated by a monotone network of depth

2

. On the other hand, we prove that for every monotone data set with

n

points in

\mathbb{R}^d

, there exists an interpolating monotone network of depth

4

and size

O(nd)

. Our interpolation result implies that every monotone function over

[0,1]^d

can be approximated arbitrarily well by a depth-4 monotone network, improving the previous best-known construction of depth

d+1

. Finally, building on results from Boolean circuit complexity, we show that the inductive bias of having positive parameters can lead to a super-polynomial blow-up in the number of neurons when approximating monotone functions.Comment: 19 page

arXiv.org e-Print Archive

Towards Understanding Hierarchical Learning: Benefits of Neural Representations

Author: Bai Yu
Chen Minshuo
Lee Jason D.
Socher Richard
Wang Huan
Xiong Caiming
Zhao Tuo
Publication venue
Publication date: 01/01/2020
Field of study

Deep neural networks can empirically perform efficient hierarchical learning, in which the layers learn useful representations of the data. However, how they make use of the intermediate representations are not explained by recent theories that relate them to "shallow learners" such as kernels. In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks and can be advantageous over raw inputs. We consider a fixed, randomly initialized neural network as a representation function fed into another trainable network. When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-

p

polynomial (

p \geq 4

) in

d

dimension, neural representation requires only

\tilde{O}(d^{\lceil p/2 \rceil})

samples, while the best-known sample complexity upper bound for the raw input is

\tilde{O}(d^{p-1})

. We contrast our result with a lower bound showing that neural representations do not improve over the raw input (in the infinite width limit), when the trainable network is instead a neural tangent kernel. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.Comment: 41 pages, published in NeurIPS 202

arXiv.org e-Print Archive

Princeton University Open Access Repository

Feature-Learning Networks Are Consistent Across Widths At Realistic Scales

Author: Atanasov Alexander
Bordelon Blake
Morwani Depen
Pehlevan Cengiz
Sainathan Sabarish
Vyas Nikhil
Publication venue
Publication date: 28/05/2023
Field of study

We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training. For simple tasks such as CIFAR-5m this holds throughout training for networks of realistic widths. We also show that structural properties of the models, including internal representations, preactivation distributions, edge of stability phenomena, and large learning rate effects are consistent across large widths. This motivates the hypothesis that phenomena seen in realistic models can be captured by infinite-width, feature-learning limits. For harder tasks (such as ImageNet and language modeling), and later training times, finite-width deviations grow systematically. Two distinct effects cause these deviations across widths. First, the network output has initialization-dependent variance scaling inversely with width, which can be removed by ensembling networks. We observe, however, that ensembles of narrower networks perform worse than a single wide network. We call this the bias of narrower width. We conclude with a spectral perspective on the origin of this finite-width bias

arXiv.org e-Print Archive

DEEP LEARNING BASED SEGMENTATION AND CLASSIFICATION FOR IMPROVED BREAST CANCER DETECTION

Author: Asadi Bita Khosrow Asadi Khosrow
Publication venue: Scholarworks@UAEU
Publication date: 01/06/2022
Field of study

Breast Cancer is a leading killer of women globally. It is a serious health concern caused by calcifications or abnormal tissue growth in the breast. Doing a screening and identifying the nature of the tumor as benign or malignant is important to facilitate early intervention, which drastically decreases the mortality rate. Usually, it uses ultrasound images, since they are easily accessible to most people and have no drawbacks as such, unlike in the other most famous screening technique of mammograms where in some cases you may not get a clear scan. In this thesis, the approach to this problem is to build a stacked model which makes predictions on the basis of the shape, pattern, and spread of the tumor. To achieve this, typical steps are pre-processing of images followed by segmentation of the image and classification. For pre-processing, the proposed approach in this thesis uses histogram equalization that helps in improving the contrast of the image, making the tumor stand out from its surroundings, and making it easier for the segmentation step. Through segmentation, the approach uses UNet architecture with a ResNet backbone. The UNet architecture is made specifically for biomedical imaging. The aim of segmentation is to separate the tumor from the ultrasound image so that the classification model can make its predictions from this mask. The metric result of the F1-score for the segmentation model turned out to be 97.30%. For classification, the CNN base model is used for feature extraction from provided masks. These are then fed into a network and the predictions are done. The base CNN model used is ResNet50 and the neural network used for the output layer is a simple 8-layer network with ReLU activation in the hidden layers and softmax in the final decision-making layer. The ResNet weights are initialized from training on ImageNet. The ResNet50 returns 2048 features from each mask. These are then fed into the network for decision-making. The hidden layers of the neural network have 1024, 512, 256, 128, 64, 32, and 10 neurons respectively. The classification accuracy achieved for the proposed model was 98.61% with an F1 score of 98.41%. The detailed experimental results are presented along with comparative data

United Arab Emirates University: Scholarworks@UAEU / جامعة الامارات

Provable Guarantees for Neural Networks via Gradient Feature Learning

Author: Liang Yingyu
Shi Zhenmei
Wei Junyi
Publication venue
Publication date: 18/10/2023
Field of study

Neural networks have achieved remarkable empirical performance, while the current theoretical analysis is not adequate for understanding their success, e.g., the Neural Tangent Kernel approach fails to capture their key feature learning ability, while recent analyses on feature learning are typically problem-specific. This work proposes a unified analysis framework for two-layer networks trained by gradient descent. The framework is centered around the principle of feature learning from gradients, and its effectiveness is demonstrated by applications in several prototypical problems, such as mixtures of Gaussians and parity functions. The framework also sheds light on interesting network learning phenomena such as feature learning beyond kernels and the lottery ticket hypothesis.Comment: NeurIPS 2023, 71 page

arXiv.org e-Print Archive