76 research outputs found
The Underlying Scaling Laws and Universal Statistical Structure of Complex Datasets
We study universal traits which emerge both in real-world complex datasets,
as well as in artificially generated ones. Our approach is to analogize data to
a physical system and employ tools from statistical physics and Random Matrix
Theory (RMT) to reveal their underlying structure. We focus on the
feature-feature covariance matrix, analyzing both its local and global
eigenvalue statistics. Our main observations are: (i) The power-law scalings
that the bulk of its eigenvalues exhibit are vastly different for uncorrelated
random data compared to real-world data, (ii) this scaling behavior can be
completely recovered by introducing long range correlations in a simple way to
the synthetic data, (iii) both generated and real-world datasets lie in the
same universality class from the RMT perspective, as chaotic rather than
integrable systems, (iv) the expected RMT statistical behavior already
manifests for empirical covariance matrices at dataset sizes significantly
smaller than those conventionally used for real-world training, and can be
related to the number of samples required to approximate the population
power-law scaling behavior, (v) the Shannon entropy is correlated with local
RMT structure and eigenvalues scaling, and substantially smaller in strongly
correlated datasets compared to uncorrelated synthetic data, and requires fewer
samples to reach the distribution entropy. These findings can have numerous
implications to the characterization of the complexity of data sets, including
differentiating synthetically generated from natural data, quantifying noise,
developing better data pruning methods and classifying effective learning
models utilizing these scaling laws.Comment: 16 pages, 7 figure
The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence
Turbulence is a complex spatial and temporal structure created by the strong
non-linear dynamics of fluid flows at high Reynolds numbers. Despite being an
ubiquitous phenomenon that has been studied for centuries, a full understanding
of turbulence remained a formidable challenge. Here, we introduce tools from
the fields of quantum chaos and Random Matrix Theory (RMT) and present a
detailed analysis of image datasets generated from turbulence simulations of
incompressible and compressible fluid flows. Focusing on two observables: the
data Gram matrix and the single image distribution, we study both the local and
global eigenvalue statistics and compare them to classical chaos, uncorrelated
noise and natural images. We show that from the RMT perspective, the turbulence
Gram matrices lie in the same universality class as quantum chaotic rather than
integrable systems, and the data exhibits power-law scalings in the bulk of its
eigenvalues which are vastly different from uncorrelated classical chaos,
random data, natural images. Interestingly, we find that the single sample
distribution only appears as fully RMT chaotic, but deviates from chaos at
larger correlation lengths, as well as exhibiting different scaling properties.Comment: 9 pages, 4 figure
Growth factors in human disease: The realities, pitfalls, and promise
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/25599/1/0000146.pd
Scheduling with Testing
We study a new class of scheduling problems that capture common settings in service environments, in which one has to serve a collection of jobs that have a priori uncertain attributes (e.g., processing times and priorities) and the service provider has to decide how to dynamically allocate resources (e.g., people, equipment, and time) between testing (diagnosing) jobs to learn more about their respective uncertain attributes and processing jobs. The former could inform future decisions, but could delay the service time for other jobs, while the latter directly advances the processing of the jobs but requires making decisions under uncertainty. Through novel analysis we obtain surprising structural results of optimal policies that provide operational managerial insights, efficient optimal and near-optimal algorithms, and quantification of the value of testing. We believe that our approach will lead to further research to explore this important practical trade-off
Mucin gene expression in bile of patients with and without gallstone disease, collected by endoscopic retrograde cholangiography
AIM: To investigate the pattern of mucin expression and concentration in bile obtained during endoscopic retrograde cholangiography (ERC) in relation to gallstone disease
Decision Fatigue and Heuristic Analyst Forecasts
Psychological evidence indicates that decision quality declines after an extensive session of decision-making, a phenomenon known as decision fatigue. We study whether decision fatigue affects analysts’ judgments. Analysts cover multiple firms and often issue several forecasts in a single day. We find that forecast accuracy declines over the course of a day as the number of forecasts the analyst has already issued increases. Also consistent with decision fatigue, we find that the more forecasts an analyst issues, the higher the likelihood the analyst resorts to more heuristic decisions by herding more closely with the consensus forecast, by self-herding (i.e., reissuing their own previous outstanding forecasts), and by issuing a rounded forecast. Finally, we find that the stock market understands these effects and discounts for analyst decision fatigue
- …