183 research outputs found
The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence
Turbulence is a complex spatial and temporal structure created by the strong
non-linear dynamics of fluid flows at high Reynolds numbers. Despite being an
ubiquitous phenomenon that has been studied for centuries, a full understanding
of turbulence remained a formidable challenge. Here, we introduce tools from
the fields of quantum chaos and Random Matrix Theory (RMT) and present a
detailed analysis of image datasets generated from turbulence simulations of
incompressible and compressible fluid flows. Focusing on two observables: the
data Gram matrix and the single image distribution, we study both the local and
global eigenvalue statistics and compare them to classical chaos, uncorrelated
noise and natural images. We show that from the RMT perspective, the turbulence
Gram matrices lie in the same universality class as quantum chaotic rather than
integrable systems, and the data exhibits power-law scalings in the bulk of its
eigenvalues which are vastly different from uncorrelated classical chaos,
random data, natural images. Interestingly, we find that the single sample
distribution only appears as fully RMT chaotic, but deviates from chaos at
larger correlation lengths, as well as exhibiting different scaling properties.Comment: 9 pages, 4 figure
The Underlying Scaling Laws and Universal Statistical Structure of Complex Datasets
We study universal traits which emerge both in real-world complex datasets,
as well as in artificially generated ones. Our approach is to analogize data to
a physical system and employ tools from statistical physics and Random Matrix
Theory (RMT) to reveal their underlying structure. We focus on the
feature-feature covariance matrix, analyzing both its local and global
eigenvalue statistics. Our main observations are: (i) The power-law scalings
that the bulk of its eigenvalues exhibit are vastly different for uncorrelated
random data compared to real-world data, (ii) this scaling behavior can be
completely recovered by introducing long range correlations in a simple way to
the synthetic data, (iii) both generated and real-world datasets lie in the
same universality class from the RMT perspective, as chaotic rather than
integrable systems, (iv) the expected RMT statistical behavior already
manifests for empirical covariance matrices at dataset sizes significantly
smaller than those conventionally used for real-world training, and can be
related to the number of samples required to approximate the population
power-law scaling behavior, (v) the Shannon entropy is correlated with local
RMT structure and eigenvalues scaling, and substantially smaller in strongly
correlated datasets compared to uncorrelated synthetic data, and requires fewer
samples to reach the distribution entropy. These findings can have numerous
implications to the characterization of the complexity of data sets, including
differentiating synthetically generated from natural data, quantifying noise,
developing better data pruning methods and classifying effective learning
models utilizing these scaling laws.Comment: 16 pages, 7 figure
Decoupled Weight Decay for Any Norm
With the success of deep neural networks (NNs) in a variety of domains, the
computational and storage requirements for training and deploying large NNs
have become a bottleneck for further improvements. Sparsification has
consequently emerged as a leading approach to tackle these issues. In this
work, we consider a simple yet effective approach to sparsification, based on
the Bridge, or regularization during training. We introduce a novel
weight decay scheme, which generalizes the standard weight decay to any
norm. We show that this scheme is compatible with adaptive optimizers, and
avoids the gradient divergence associated with norms. We empirically
demonstrate that it leads to highly sparse networks, while maintaining
generalization performance comparable to standard regularization.Comment: GitHub link: https://github.com/Nadav-out/PAda
Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets
We derive closed-form expressions for the Bayes optimal decision boundaries
in binary classification of high dimensional overlapping Gaussian mixture model
(GMM) data, and show how they depend on the eigenstructure of the class
covariances, for particularly interesting structured data. We empirically
demonstrate, through experiments on synthetic GMMs inspired by real-world data,
that deep neural networks trained for classification, learn predictors which
approximate the derived optimal classifiers. We further extend our study to
networks trained on authentic data, observing that decision thresholds
correlate with the covariance eigenvectors rather than the eigenvalues,
mirroring our GMM analysis. This provides theoretical insights regarding neural
networks' ability to perform probabilistic inference and distill statistical
patterns from intricate distributions.Comment: 19 pages, 14 figure
The Supercooling Window at Weak and Strong Coupling
Supercooled first order phase transitions are typical of theories where
conformal symmetry is predominantly spontaneously broken. In these theories the
fate of the flat scalar direction is highly sensitive to the size and the
scaling dimension of the explicit breaking deformations. For a given
deformation, the coupling must lie in a particular region to realize a
supercooled first order phase transition. We identify the supercooling window
in weakly coupled theories and derive a fully analytical understanding of its
boundaries. Mapping these boundaries allows us to identify the deformations
enlarging the supercooling window and to characterize their dynamics
analytically. For completeness we also discuss strongly coupled conformal field
theories with an holographic dual, where the complete characterization of the
supercooling window is challenged by calculability issues.Comment: 16 pages + appendices, 12 figures; v2: minor typo correcte
Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding
Grokking is the intriguing phenomenon where a model learns to generalize long
after it has fit the training data. We show both analytically and numerically
that grokking can surprisingly occur in linear networks performing linear tasks
in a simple teacher-student setup with Gaussian inputs. In this setting, the
full training dynamics is derived in terms of the training and generalization
data covariance matrix. We present exact predictions on how the grokking time
depends on input and output dimensionality, train sample size, regularization,
and network initialization. We demonstrate that the sharp increase in
generalization accuracy may not imply a transition from "memorization" to
"understanding", but can simply be an artifact of the accuracy measure. We
provide empirical verification for our calculations, along with preliminary
results indicating that some predictions also hold for deeper networks, with
non-linear activations.Comment: 17 pages, 6 figure
Noise Injection as a Probe of Deep Learning Dynamics
We propose a new method to probe the learning mechanism of Deep Neural
Networks (DNN) by perturbing the system using Noise Injection Nodes (NINs).
These nodes inject uncorrelated noise via additional optimizable weights to
existing feed-forward network architectures, without changing the optimization
algorithm. We find that the system displays distinct phases during training,
dictated by the scale of injected noise. We first derive expressions for the
dynamics of the network and utilize a simple linear model as a test case. We
find that in some cases, the evolution of the noise nodes is similar to that of
the unperturbed loss, thus indicating the possibility of using NINs to learn
more about the full system in the future.Comment: 11 pages, 3 figure
Oral White Lesions Associated with Chewing Khat
<p>Abstract</p> <p>Introduction</p> <p>Khat is a cultivated plant whose leaves when chewed elevate mood. Unlike the chewing of betel nut, no association between the white oral mucosal lesions in khat users and oral malignancies has been reported. Chewing of khat has been documented in many countries and has increased with worldwide migration. The impact of chewing khat upon the oral mucosa is essentially unknown.</p> <p>Purpose</p> <p>The purpose of this study was to assess the occurrence of oral white changes in chronic khat chewers. Oral mucosal changes in a group of 47 Yemenite Israeli men over 30 years of age, who had chewed khat more than 3 years, were compared to those of 55 Yemenite men who did not chew.</p> <p>Results</p> <p>White lesions were significantly more prevalent in the khat chewers (83%) compared to the non chewing individuals (16%) (P < 0.001). White oral lesions were identified primarily on the lower buccal attached gingival mucosa, the alveolar mucosa and the lower mucobuccal fold on the chewing side (p < 0.001). There was no significant association between the occurrence of the white lesions and smoking. Even though the majority of the white lesions (85.4%) were homogenous, 71.4% of the non homogenous lesions were identified in khat chewers. Vital staining with toluidine blue and exfoliative cytology was conducted on a subset of patients with homogenous and non-homogenous oral lesions, and there were no findings suspicious for pre-malignant or malignant changes.</p> <p>Discussion</p> <p>This study demonstrated a relationship between khat chewing and oral white lesions, which we attribute to chronic local mechanical and chemical irritation of the mucosa. Our findings also suggest that mucosal changes associated with khat are benign, however, this initial study requires further studies including follow-up of khat users to confirm the current findings, including the likely benign changes associated with chronic use and histologic findings of clinical lesions.</p
Noise Injection Node Regularization for Robust Learning
We introduce Noise Injection Node Regularization (NINR), a method of
injecting structured noise into Deep Neural Networks (DNN) during the training
stage, resulting in an emergent regularizing effect. We present theoretical and
empirical evidence for substantial improvement in robustness against various
test data perturbations for feed-forward DNNs when trained under NINR. The
novelty in our approach comes from the interplay of adaptive noise injection
and initialization conditions such that noise is the dominant driver of
dynamics at the start of training. As it simply requires the addition of
external nodes without altering the existing network structure or optimization
algorithms, this method can be easily incorporated into many standard problem
specifications. We find improved stability against a number of data
perturbations, including domain shifts, with the most dramatic improvement
obtained for unstructured noise, where our technique outperforms other existing
methods such as Dropout or regularization, in some cases. We further show
that desirable generalization properties on clean data are generally
maintained.Comment: 16 pages, 9 figure
- …
