183 research outputs found

    The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence

    Full text link
    Turbulence is a complex spatial and temporal structure created by the strong non-linear dynamics of fluid flows at high Reynolds numbers. Despite being an ubiquitous phenomenon that has been studied for centuries, a full understanding of turbulence remained a formidable challenge. Here, we introduce tools from the fields of quantum chaos and Random Matrix Theory (RMT) and present a detailed analysis of image datasets generated from turbulence simulations of incompressible and compressible fluid flows. Focusing on two observables: the data Gram matrix and the single image distribution, we study both the local and global eigenvalue statistics and compare them to classical chaos, uncorrelated noise and natural images. We show that from the RMT perspective, the turbulence Gram matrices lie in the same universality class as quantum chaotic rather than integrable systems, and the data exhibits power-law scalings in the bulk of its eigenvalues which are vastly different from uncorrelated classical chaos, random data, natural images. Interestingly, we find that the single sample distribution only appears as fully RMT chaotic, but deviates from chaos at larger correlation lengths, as well as exhibiting different scaling properties.Comment: 9 pages, 4 figure

    The Underlying Scaling Laws and Universal Statistical Structure of Complex Datasets

    Full text link
    We study universal traits which emerge both in real-world complex datasets, as well as in artificially generated ones. Our approach is to analogize data to a physical system and employ tools from statistical physics and Random Matrix Theory (RMT) to reveal their underlying structure. We focus on the feature-feature covariance matrix, analyzing both its local and global eigenvalue statistics. Our main observations are: (i) The power-law scalings that the bulk of its eigenvalues exhibit are vastly different for uncorrelated random data compared to real-world data, (ii) this scaling behavior can be completely recovered by introducing long range correlations in a simple way to the synthetic data, (iii) both generated and real-world datasets lie in the same universality class from the RMT perspective, as chaotic rather than integrable systems, (iv) the expected RMT statistical behavior already manifests for empirical covariance matrices at dataset sizes significantly smaller than those conventionally used for real-world training, and can be related to the number of samples required to approximate the population power-law scaling behavior, (v) the Shannon entropy is correlated with local RMT structure and eigenvalues scaling, and substantially smaller in strongly correlated datasets compared to uncorrelated synthetic data, and requires fewer samples to reach the distribution entropy. These findings can have numerous implications to the characterization of the complexity of data sets, including differentiating synthetically generated from natural data, quantifying noise, developing better data pruning methods and classifying effective learning models utilizing these scaling laws.Comment: 16 pages, 7 figure

    Decoupled Weight Decay for Any pp Norm

    Full text link
    With the success of deep neural networks (NNs) in a variety of domains, the computational and storage requirements for training and deploying large NNs have become a bottleneck for further improvements. Sparsification has consequently emerged as a leading approach to tackle these issues. In this work, we consider a simple yet effective approach to sparsification, based on the Bridge, or LpL_p regularization during training. We introduce a novel weight decay scheme, which generalizes the standard L2L_2 weight decay to any pp norm. We show that this scheme is compatible with adaptive optimizers, and avoids the gradient divergence associated with 0<p<10<p<1 norms. We empirically demonstrate that it leads to highly sparse networks, while maintaining generalization performance comparable to standard L2L_2 regularization.Comment: GitHub link: https://github.com/Nadav-out/PAda

    Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets

    Full text link
    We derive closed-form expressions for the Bayes optimal decision boundaries in binary classification of high dimensional overlapping Gaussian mixture model (GMM) data, and show how they depend on the eigenstructure of the class covariances, for particularly interesting structured data. We empirically demonstrate, through experiments on synthetic GMMs inspired by real-world data, that deep neural networks trained for classification, learn predictors which approximate the derived optimal classifiers. We further extend our study to networks trained on authentic data, observing that decision thresholds correlate with the covariance eigenvectors rather than the eigenvalues, mirroring our GMM analysis. This provides theoretical insights regarding neural networks' ability to perform probabilistic inference and distill statistical patterns from intricate distributions.Comment: 19 pages, 14 figure

    The Supercooling Window at Weak and Strong Coupling

    Full text link
    Supercooled first order phase transitions are typical of theories where conformal symmetry is predominantly spontaneously broken. In these theories the fate of the flat scalar direction is highly sensitive to the size and the scaling dimension of the explicit breaking deformations. For a given deformation, the coupling must lie in a particular region to realize a supercooled first order phase transition. We identify the supercooling window in weakly coupled theories and derive a fully analytical understanding of its boundaries. Mapping these boundaries allows us to identify the deformations enlarging the supercooling window and to characterize their dynamics analytically. For completeness we also discuss strongly coupled conformal field theories with an holographic dual, where the complete characterization of the supercooling window is challenged by calculability issues.Comment: 16 pages + appendices, 12 figures; v2: minor typo correcte

    Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding

    Full text link
    Grokking is the intriguing phenomenon where a model learns to generalize long after it has fit the training data. We show both analytically and numerically that grokking can surprisingly occur in linear networks performing linear tasks in a simple teacher-student setup with Gaussian inputs. In this setting, the full training dynamics is derived in terms of the training and generalization data covariance matrix. We present exact predictions on how the grokking time depends on input and output dimensionality, train sample size, regularization, and network initialization. We demonstrate that the sharp increase in generalization accuracy may not imply a transition from "memorization" to "understanding", but can simply be an artifact of the accuracy measure. We provide empirical verification for our calculations, along with preliminary results indicating that some predictions also hold for deeper networks, with non-linear activations.Comment: 17 pages, 6 figure

    Noise Injection as a Probe of Deep Learning Dynamics

    Full text link
    We propose a new method to probe the learning mechanism of Deep Neural Networks (DNN) by perturbing the system using Noise Injection Nodes (NINs). These nodes inject uncorrelated noise via additional optimizable weights to existing feed-forward network architectures, without changing the optimization algorithm. We find that the system displays distinct phases during training, dictated by the scale of injected noise. We first derive expressions for the dynamics of the network and utilize a simple linear model as a test case. We find that in some cases, the evolution of the noise nodes is similar to that of the unperturbed loss, thus indicating the possibility of using NINs to learn more about the full system in the future.Comment: 11 pages, 3 figure

    Oral White Lesions Associated with Chewing Khat

    Get PDF
    <p>Abstract</p> <p>Introduction</p> <p>Khat is a cultivated plant whose leaves when chewed elevate mood. Unlike the chewing of betel nut, no association between the white oral mucosal lesions in khat users and oral malignancies has been reported. Chewing of khat has been documented in many countries and has increased with worldwide migration. The impact of chewing khat upon the oral mucosa is essentially unknown.</p> <p>Purpose</p> <p>The purpose of this study was to assess the occurrence of oral white changes in chronic khat chewers. Oral mucosal changes in a group of 47 Yemenite Israeli men over 30 years of age, who had chewed khat more than 3 years, were compared to those of 55 Yemenite men who did not chew.</p> <p>Results</p> <p>White lesions were significantly more prevalent in the khat chewers (83%) compared to the non chewing individuals (16%) (P < 0.001). White oral lesions were identified primarily on the lower buccal attached gingival mucosa, the alveolar mucosa and the lower mucobuccal fold on the chewing side (p < 0.001). There was no significant association between the occurrence of the white lesions and smoking. Even though the majority of the white lesions (85.4%) were homogenous, 71.4% of the non homogenous lesions were identified in khat chewers. Vital staining with toluidine blue and exfoliative cytology was conducted on a subset of patients with homogenous and non-homogenous oral lesions, and there were no findings suspicious for pre-malignant or malignant changes.</p> <p>Discussion</p> <p>This study demonstrated a relationship between khat chewing and oral white lesions, which we attribute to chronic local mechanical and chemical irritation of the mucosa. Our findings also suggest that mucosal changes associated with khat are benign, however, this initial study requires further studies including follow-up of khat users to confirm the current findings, including the likely benign changes associated with chronic use and histologic findings of clinical lesions.</p

    Noise Injection Node Regularization for Robust Learning

    Full text link
    We introduce Noise Injection Node Regularization (NINR), a method of injecting structured noise into Deep Neural Networks (DNN) during the training stage, resulting in an emergent regularizing effect. We present theoretical and empirical evidence for substantial improvement in robustness against various test data perturbations for feed-forward DNNs when trained under NINR. The novelty in our approach comes from the interplay of adaptive noise injection and initialization conditions such that noise is the dominant driver of dynamics at the start of training. As it simply requires the addition of external nodes without altering the existing network structure or optimization algorithms, this method can be easily incorporated into many standard problem specifications. We find improved stability against a number of data perturbations, including domain shifts, with the most dramatic improvement obtained for unstructured noise, where our technique outperforms other existing methods such as Dropout or L2L_2 regularization, in some cases. We further show that desirable generalization properties on clean data are generally maintained.Comment: 16 pages, 9 figure
    corecore