24 research outputs found

    Batch Normalization Orthogonalizes Representations in Deep Random Networks

    Full text link
    This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network. We establish a non-asymptotic characterization of the interplay between depth, width, and the orthogonality of deep representations. More precisely, under a mild assumption, we prove that the deviation of the representations from orthogonality rapidly decays with depth up to a term inversely proportional to the network width. This result has two main implications: 1) Theoretically, as the depth grows, the distribution of the representation -- after the linear layers -- contracts to a Wasserstein-2 ball around an isotropic Gaussian distribution. Furthermore, the radius of this Wasserstein ball shrinks with the width of the network. 2) In practice, the orthogonality of the representations directly influences the performance of stochastic gradient descent (SGD). When representations are initially aligned, we observe SGD wastes many iterations to orthogonalize representations before the classification. Nevertheless, we experimentally show that starting optimization from orthogonal representations is sufficient to accelerate SGD, with no need for BN

    On the impact of activation and normalization in obtaining isometric embeddings at initialization

    Full text link
    In this paper, we explore the structure of the penultimate Gram matrix in deep neural networks, which contains the pairwise inner products of outputs corresponding to a batch of inputs. In several architectures it has been observed that this Gram matrix becomes degenerate with depth at initialization, which dramatically slows training. Normalization layers, such as batch or layer normalization, play a pivotal role in preventing the rank collapse issue. Despite promising advances, the existing theoretical results (i) do not extend to layer normalization, which is widely used in transformers, (ii) can not characterize the bias of normalization quantitatively at finite depth. To bridge this gap, we provide a proof that layer normalization, in conjunction with activation layers, biases the Gram matrix of a multilayer perceptron towards isometry at an exponential rate with depth at initialization. We quantify this rate using the Hermite expansion of the activation function, highlighting the importance of higher order (2\ge 2) Hermite coefficients in the bias towards isometry

    Identification of medicinal plants effective on sinusitis native to Shiraz province in Iran

    Get PDF
    Sinusitis is one of the most infectious diseases that affect holes around the nose such as frontal ethmoid sinuses, maxillary and sphenoid. Symptoms usually include nasal congestion and obstruction, feeling of pressure or fullness in the face, anterior or posterior nasal causing discharge, headaches, fever, swelling and erythema in forehead or cheek and cough. The symptoms might be edema and mucosal congestion, nasal drainage, posterior nasal discharge, nasal septum deviation and polyps. The medicinal plants identified for instance are Amygdalus scoparia Spach, Echinophora platyloba DC., Haplophyllum perforatum L, Lavandula stoechas L, Borago officinalis, Matricaria recutita, Descurainia Sophia (L.) Schr and Haplophyllum perforatum L to treat sinusitis in Shiraz. Many of these plants have antioxidant activity and contain bioactive compounds such as flavonoids, flavonoids, polyphenols, anthocyanins, tannins and many other pharmaceutical bioactive ingredients that have effects on sinusitis. This paper aims to review the recently published papers in this topic

    EEG-Based Functional Brain Networks: Does the Network Size Matter?

    Get PDF
    Functional connectivity in human brain can be represented as a network using electroencephalography (EEG) signals. These networks – whose nodes can vary from tens to hundreds – are characterized by neurobiologically meaningful graph theory metrics. This study investigates the degree to which various graph metrics depend upon the network size. To this end, EEGs from 32 normal subjects were recorded and functional networks of three different sizes were extracted. A state-space based method was used to calculate cross-correlation matrices between different brain regions. These correlation matrices were used to construct binary adjacency connectomes, which were assessed with regards to a number of graph metrics such as clustering coefficient, modularity, efficiency, economic efficiency, and assortativity. We showed that the estimates of these metrics significantly differ depending on the network size. Larger networks had higher efficiency, higher assortativity and lower modularity compared to those with smaller size and the same density. These findings indicate that the network size should be considered in any comparison of networks across studies

    Nonlinear Dimensionality Reduction via Path-Based Isometric Mapping

    No full text

    Entropy Maximization with Depth: A Variational Principle for Random Neural Networks

    No full text
    To understand the essential role of depth in neural networks, we investigate a variational principle for depth: Does increasing depth perform an implicit optimization for the representations in neural networks? We prove that random neural networks equipped with batch normalization maximize the differential entropy of representations with depth up to constant factors, assuming that the representations are contractive. Thus, representations inherently obey the principle of maximum entropy at initialization, in the absence of information about the learning task. Our variational formulation for neural representations characterizes the interplay between representation entropy and architectural components, including depth, width, and non-linear activations, thereby potentially inspiring the design of neural architectures
    corecore