7 research outputs found
Universality of underlying mechanism for successful deep learning
An underlying mechanism for successful deep learning (DL) with a limited deep
architecture and dataset, namely VGG-16 on CIFAR-10, was recently presented
based on a quantitative method to measure the quality of a single filter in
each layer. In this method, each filter identifies small clusters of possible
output labels, with additional noise selected as labels out of the clusters.
This feature is progressively sharpened with the layers, resulting in an
enhanced signal-to-noise ratio (SNR) and higher accuracy. In this study, the
suggested universal mechanism is verified for VGG-16 and EfficientNet-B0
trained on the CIFAR-100 and ImageNet datasets with the following main results.
First, the accuracy progressively increases with the layers, whereas the noise
per filter typically progressively decreases. Second, for a given deep
architecture, the maximal error rate increases approximately linearly with the
number of output labels. Third, the average filter cluster size and the number
of clusters per filter at the last convolutional layer adjacent to the output
layer are almost independent of the number of dataset labels in the range [3,
1,000], while a high SNR is preserved. The presented DL mechanism suggests
several techniques, such as applying filter's cluster connections (AFCC), to
improve the computational complexity and accuracy of deep architectures and
furthermore pinpoints the simplification of pre-existing structures while
maintaining their accuracies.Comment: 27 pages,5 figures, 6 tables. arXiv admin note: text overlap with
arXiv:2305.1807
Learning on tree architectures outperforms a convolutional feedforward network
Advanced deep learning architectures consist of tens of fully connected and
convolutional hidden layers, which are already extended to hundreds, and are
far from their biological realization. Their implausible biological dynamics is
based on changing a weight in a non-local manner, as the number of routes
between an output unit and a weight is typically large, using the
backpropagation technique. Here, offline and online CIFAR-10 database learning
on 3-layer tree architectures, inspired by experimental-based dendritic tree
adaptations, outperforms the achievable success rates of the 5-layer
convolutional LeNet. Its highly pruning tree backpropagation procedure, where a
single route connects an output unit and a weight, represents an efficient
dendritic deep learning.Comment: 20 pages, 4 figures, 1 table (improved figures resolution
Efficient shallow learning as an alternative to deep learning
The realization of complex classification tasks requires training of deep
learning (DL) architectures consisting of tens or even hundreds of
convolutional and fully connected hidden layers, which is far from the reality
of the human brain. According to the DL rationale, the first convolutional
layer reveals localized patterns in the input and large-scale patterns in the
following layers, until it reliably characterizes a class of inputs. Here, we
demonstrate that with a fixed ratio between the depths of the first and second
convolutional layers, the error rates of the generalized shallow LeNet
architecture, consisting of only five layers, decay as a power law with the
number of filters in the first convolutional layer. The extrapolation of this
power law indicates that the generalized LeNet can achieve small error rates
that were previously obtained for the CIFAR-10 database using DL architectures.
A power law with a similar exponent also characterizes the generalized VGG-16
architecture. However, this results in a significantly increased number of
operations required to achieve a given error rate with respect to LeNet. This
power law phenomenon governs various generalized LeNet and VGG-16
architectures, hinting at its universal behavior and suggesting a quantitative
hierarchical time-space complexity among machine learning architectures.
Additionally, the conservation law along the convolutional layers, which is the
square-root of their size times their depth, is found to asymptotically
minimize error rates. The efficient shallow learning that is demonstrated in
this study calls for further quantitative examination using various databases
and architectures and its accelerated implementation using future dedicated
hardware developments.Comment: 26 pages, 4 figures (improved figures resolution
Enhancing the success rates by performing pooling decisions adjacent to the output layer
Learning classification tasks of (2^nx2^n) inputs typically consist of \le n
(2x2) max-pooling (MP) operators along the entire feedforward deep
architecture. Here we show, using the CIFAR-10 database, that pooling decisions
adjacent to the last convolutional layer significantly enhance accuracy success
rates (SRs). In particular, average SRs of the advanced VGG with m layers
(A-VGGm) architectures are 0.936, 0.940, 0.954, 0.955, and 0.955 for m=6, 8,
14, 13, and 16, respectively. The results indicate A-VGG8s' SR is superior to
VGG16s', and that the SRs of A-VGG13 and A-VGG16 are equal, and comparable to
that of Wide-ResNet16. In addition, replacing the three fully connected (FC)
layers with one FC layer, A-VGG6 and A-VGG14, or with several linear activation
FC layers, yielded similar SRs. These significantly enhanced SRs stem from
training the most influential input-output routes, in comparison to the
inferior routes selected following multiple MP decisions along the deep
architecture. In addition, SRs are sensitive to the order of the
non-commutative MP and average pooling operators adjacent to the output layer,
varying the number and location of training routes. The results call for the
reexamination of previously proposed deep architectures and their SRs by
utilizing the proposed pooling strategy adjacent to the output layer.Comment: 27 pages, 3 figures, 1 table and Supplementary Informatio
The mechanism underlying successful deep learning
Deep architectures consist of tens or hundreds of convolutional layers (CLs)
that terminate with a few fully connected (FC) layers and an output layer
representing the possible labels of a complex classification task. According to
the existing deep learning (DL) rationale, the first CL reveals localized
features from the raw data, whereas the subsequent layers progressively extract
higher-level features required for refined classification. This article
presents an efficient three-phase procedure for quantifying the mechanism
underlying successful DL. First, a deep architecture is trained to maximize the
success rate (SR). Next, the weights of the first several CLs are fixed and
only the concatenated new FC layer connected to the output is trained,
resulting in SRs that progress with the layers. Finally, the trained FC weights
are silenced, except for those emerging from a single filter, enabling the
quantification of the functionality of this filter using a correlation matrix
between input labels and averaged output fields, hence a well-defined set of
quantifiable features is obtained. Each filter essentially selects a single
output label independent of the input label, which seems to prevent high SRs;
however, it counterintuitively identifies a small subset of possible output
labels. This feature is an essential part of the underlying DL mechanism and is
progressively sharpened with layers, resulting in enhanced signal-to-noise
ratios and SRs. Quantitatively, this mechanism is exemplified by the VGG-16,
VGG-6, and AVGG-16. The proposed mechanism underlying DL provides an accurate
tool for identifying each filter's quality and is expected to direct additional
procedures to improve the SR, computational complexity, and latency of DL.Comment: 33 pages, 8 figure
Learning on tree architectures outperforms a convolutional feedforward network
Abstract Advanced deep learning architectures consist of tens of fully connected and convolutional hidden layers, currently extended to hundreds, are far from their biological realization. Their implausible biological dynamics relies on changing a weight in a non-local manner, as the number of routes between an output unit and a weight is typically large, using the backpropagation technique. Here, a 3-layer tree architecture inspired by experimental-based dendritic tree adaptations is developed and applied to the offline and online learning of the CIFAR-10 database. The proposed architecture outperforms the achievable success rates of the 5-layer convolutional LeNet. Moreover, the highly pruned tree backpropagation approach of the proposed architecture, where a single route connects an output unit and a weight, represents an efficient dendritic deep learning
Towards a universal mechanism for successful deep learning
Abstract Recently, the underlying mechanism for successful deep learning (DL) was presented based on a quantitative method that measures the quality of a single filter in each layer of a DL model, particularly VGG-16 trained on CIFAR-10. This method exemplifies that each filter identifies small clusters of possible output labels, with additional noise selected as labels outside the clusters. This feature is progressively sharpened with each layer, resulting in an enhanced signal-to-noise ratio (SNR), which leads to an increase in the accuracy of the DL network. In this study, this mechanism is verified for VGG-16 and EfficientNet-B0 trained on the CIFAR-100 and ImageNet datasets, and the main results are as follows. First, the accuracy and SNR progressively increase with the layers. Second, for a given deep architecture, the maximal error rate increases approximately linearly with the number of output labels. Third, similar trends were obtained for dataset labels in the range [3, 1000], thus supporting the universality of this mechanism. Understanding the performance of a single filter and its dominating features paves the way to highly dilute the deep architecture without affecting its overall accuracy, and this can be achieved by applying the filter’s cluster connections (AFCC)