24 research outputs found
Batch Normalization Orthogonalizes Representations in Deep Random Networks
This paper underlines a subtle property of batch-normalization (BN):
Successive batch normalizations with random linear transformations make hidden
representations increasingly orthogonal across layers of a deep neural network.
We establish a non-asymptotic characterization of the interplay between depth,
width, and the orthogonality of deep representations. More precisely, under a
mild assumption, we prove that the deviation of the representations from
orthogonality rapidly decays with depth up to a term inversely proportional to
the network width. This result has two main implications: 1) Theoretically, as
the depth grows, the distribution of the representation -- after the linear
layers -- contracts to a Wasserstein-2 ball around an isotropic Gaussian
distribution. Furthermore, the radius of this Wasserstein ball shrinks with the
width of the network. 2) In practice, the orthogonality of the representations
directly influences the performance of stochastic gradient descent (SGD). When
representations are initially aligned, we observe SGD wastes many iterations to
orthogonalize representations before the classification. Nevertheless, we
experimentally show that starting optimization from orthogonal representations
is sufficient to accelerate SGD, with no need for BN
On the impact of activation and normalization in obtaining isometric embeddings at initialization
In this paper, we explore the structure of the penultimate Gram matrix in
deep neural networks, which contains the pairwise inner products of outputs
corresponding to a batch of inputs. In several architectures it has been
observed that this Gram matrix becomes degenerate with depth at initialization,
which dramatically slows training. Normalization layers, such as batch or layer
normalization, play a pivotal role in preventing the rank collapse issue.
Despite promising advances, the existing theoretical results (i) do not extend
to layer normalization, which is widely used in transformers, (ii) can not
characterize the bias of normalization quantitatively at finite depth.
To bridge this gap, we provide a proof that layer normalization, in
conjunction with activation layers, biases the Gram matrix of a multilayer
perceptron towards isometry at an exponential rate with depth at
initialization. We quantify this rate using the Hermite expansion of the
activation function, highlighting the importance of higher order ()
Hermite coefficients in the bias towards isometry
Identification of medicinal plants effective on sinusitis native to Shiraz province in Iran
Sinusitis is one of the most infectious diseases that affect holes around the nose such as frontal ethmoid sinuses, maxillary and sphenoid. Symptoms usually include nasal congestion and obstruction, feeling of pressure or fullness in the face, anterior or posterior nasal causing discharge, headaches, fever, swelling and erythema in forehead or cheek and cough. The symptoms might be edema and mucosal congestion, nasal drainage, posterior nasal discharge, nasal septum deviation and polyps. The medicinal plants identified for instance are Amygdalus scoparia Spach, Echinophora platyloba DC., Haplophyllum perforatum L, Lavandula stoechas L, Borago officinalis, Matricaria recutita, Descurainia Sophia (L.) Schr and Haplophyllum perforatum L to treat sinusitis in Shiraz. Many of these plants have antioxidant activity and contain bioactive compounds such as flavonoids, flavonoids, polyphenols, anthocyanins, tannins and many other pharmaceutical bioactive ingredients that have effects on sinusitis. This paper aims to review the recently published papers in this topic
EEG-Based Functional Brain Networks: Does the Network Size Matter?
Functional connectivity in human brain can be represented as a network using electroencephalography (EEG) signals. These networks – whose nodes can vary from tens to hundreds – are characterized by neurobiologically meaningful graph theory metrics. This study investigates the degree to which various graph metrics depend upon the network size. To this end, EEGs from 32 normal subjects were recorded and functional networks of three different sizes were extracted. A state-space based method was used to calculate cross-correlation matrices between different brain regions. These correlation matrices were used to construct binary adjacency connectomes, which were assessed with regards to a number of graph metrics such as clustering coefficient, modularity, efficiency, economic efficiency, and assortativity. We showed that the estimates of these metrics significantly differ depending on the network size. Larger networks had higher efficiency, higher assortativity and lower modularity compared to those with smaller size and the same density. These findings indicate that the network size should be considered in any comparison of networks across studies
On the impact of activation and normalization in obtaining isometric embeddings at initialization
On bridging the gap between mean field and finite width deep random multilayer perceptron with batch normalization
ISSN:2640-349
Entropy Maximization with Depth: A Variational Principle for Random Neural Networks
To understand the essential role of depth in neural networks, we investigate a variational principle for depth: Does increasing depth perform an implicit optimization for the representations in neural networks? We prove that random neural networks equipped with batch normalization maximize the differential entropy of representations with depth up to constant factors, assuming that the representations are contractive. Thus, representations inherently obey the principle of maximum entropy at initialization, in the absence of information about the learning task. Our variational formulation for neural representations characterizes the interplay between representation entropy and architectural components, including depth, width, and non-linear activations, thereby potentially inspiring the design of neural architectures