Search CORE

94 research outputs found

The Underlying Scaling Laws and Universal Statistical Structure of Complex Datasets

Author: Levi Noam
Oz Yaron
Publication venue
Publication date: 26/06/2023
Field of study

We study universal traits which emerge both in real-world complex datasets, as well as in artificially generated ones. Our approach is to analogize data to a physical system and employ tools from statistical physics and Random Matrix Theory (RMT) to reveal their underlying structure. We focus on the feature-feature covariance matrix, analyzing both its local and global eigenvalue statistics. Our main observations are: (i) The power-law scalings that the bulk of its eigenvalues exhibit are vastly different for uncorrelated random data compared to real-world data, (ii) this scaling behavior can be completely recovered by introducing long range correlations in a simple way to the synthetic data, (iii) both generated and real-world datasets lie in the same universality class from the RMT perspective, as chaotic rather than integrable systems, (iv) the expected RMT statistical behavior already manifests for empirical covariance matrices at dataset sizes significantly smaller than those conventionally used for real-world training, and can be related to the number of samples required to approximate the population power-law scaling behavior, (v) the Shannon entropy is correlated with local RMT structure and eigenvalues scaling, and substantially smaller in strongly correlated datasets compared to uncorrelated synthetic data, and requires fewer samples to reach the distribution entropy. These findings can have numerous implications to the characterization of the complexity of data sets, including differentiating synthetically generated from natural data, quantifying noise, developing better data pruning methods and classifying effective learning models utilizing these scaling laws.Comment: 16 pages, 7 figure

arXiv.org e-Print Archive

The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence

Author: Levi Noam
Oz Yaron
Publication venue
Publication date: 02/11/2023
Field of study

Turbulence is a complex spatial and temporal structure created by the strong non-linear dynamics of fluid flows at high Reynolds numbers. Despite being an ubiquitous phenomenon that has been studied for centuries, a full understanding of turbulence remained a formidable challenge. Here, we introduce tools from the fields of quantum chaos and Random Matrix Theory (RMT) and present a detailed analysis of image datasets generated from turbulence simulations of incompressible and compressible fluid flows. Focusing on two observables: the data Gram matrix and the single image distribution, we study both the local and global eigenvalue statistics and compare them to classical chaos, uncorrelated noise and natural images. We show that from the RMT perspective, the turbulence Gram matrices lie in the same universality class as quantum chaotic rather than integrable systems, and the data exhibits power-law scalings in the bulk of its eigenvalues which are vastly different from uncorrelated classical chaos, random data, natural images. Interestingly, we find that the single sample distribution only appears as fully RMT chaotic, but deviates from chaos at larger correlation lengths, as well as exhibiting different scaling properties.Comment: 9 pages, 4 figure

arXiv.org e-Print Archive

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/25599/1/0000146.pd

Crossref

Deep Blue Documents at the University of Michigan

Information Architecture and Intertemporal Choice: A Randomized Field Experiment in the United States

Author: Levi Yaron
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

In a randomized field experiment, I show that information architecture significantly affects individuals' spending and savings behavior. I present users of a large online account aggregation provider with a personalized financial index. This index represents the inflation-protected, lifetime monthly cash flow that they can obtain, given their personal financial and demographic information and current market prices. Users receiving this information tool reduce their spending by 10.7% relative to a control group. This effect is sensitive to the description of the index using a consumption frame rather than an investment frame and to the presentation of an explicit comparison between the index and historical spending levels. Further, spending reductions are primarily in large, infrequent transactions. This experiment is the first to directly affect overall spending behavior and to demonstrate the importance of information architecture in that context. It demonstrates the potential of low cost digital information tools to impact financial behavior on a large scale

Ezid

eScholarship - University of California

Mind the App: Mobile Access to Financial Information and Consumer Behavior

Author: Shlomo benartzi
Yaron Levi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Crossref

Scheduling with Testing

Author: Levi Retsef
Magnanti Thomas L
Shaposhnik Yaron
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date: 01/02/2019
Field of study

We study a new class of scheduling problems that capture common settings in service environments, in which one has to serve a collection of jobs that have a priori uncertain attributes (e.g., processing times and priorities) and the service provider has to decide how to dynamically allocate resources (e.g., people, equipment, and time) between testing (diagnosing) jobs to learn more about their respective uncertain attributes and processing jobs. The former could inform future decisions, but could delay the service time for other jobs, while the latter directly advances the processing of the jobs but requires making decisions under uncertainty. Through novel analysis we obtain surprising structural results of optimal policies that provide operational managerial insights, efficient optimal and near-optimal algorithms, and quantification of the value of testing. We believe that our approach will lead to further research to explore this important practical trade-off

DSpace@MIT

Crossref

An Adaptive SPT Rule for Scheduling and Testing Heterogeneous Jobs

Author: Retsef Levi
Thomas L. Magnanti
Yaron Shaposhnik
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

Crossref