Search CORE

1,612 research outputs found

More "normal" than normal: scaling distributions and complex systems

Author: Alderson David
Doyle John C.
Li Lun
Willinger Walter
Publication venue: IEEE Press
Publication date: 01/01/2004
Field of study

One feature of many naturally occurring or engineered complex systems is tremendous variability in event sizes. To account for it, the behavior of these systems is often described using power law relationships or scaling distributions, which tend to be viewed as "exotic" because of their unusual properties (e.g., infinite moments). An alternate view is based on mathematical, statistical, and data-analytic arguments and suggests that scaling distributions should be viewed as "more normal than normal". In support of this latter view that has been advocated by Mandelbrot for the last 40 years, we review in this paper some relevant results from probability theory and illustrate a powerful statistical approach for deciding whether the variability associated with observed event sizes is consistent with an underlying Gaussian-type (finite variance) or scaling-type (infinite variance) distribution. We contrast this approach with traditional model fitting techniques and discuss its implications for future modeling of complex systems

Caltech Authors

Two Universality Properties Associated with the Monkey Model of Zipf's Law

Author: Perline Richard
Perline Ronald
Publication venue: 'MDPI AG'
Publication date: 30/11/2015
Field of study

The distribution of word probabilities in the monkey model of Zipf's law is associated with two universality properties: (1) the power law exponent converges strongly to

-1

as the alphabet size increases and the letter probabilities are specified as the spacings from a random division of the unit interval for any distribution with a bounded density function on

[0,1]

; and (2), on a logarithmic scale the version of the model with a finite word length cutoff and unequal letter probabilities is approximately normally distributed in the part of the distribution away from the tails. The first property is proved using a remarkably general limit theorem for the logarithm of sample spacings from Shao and Hahn, and the second property follows from Anscombe's central limit theorem for a random number of i.i.d. random variables. The finite word length model leads to a hybrid Zipf-lognormal mixture distribution closely related to work in other areas.Comment: 14 pages, 3 figure

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Pareto versus lognormal: a maximum entropy test

Author: A. P. Dempster
B. McBreen
C. Kleiber
D. Cox
D. Sornette
G. K. Zipf
G. Koop
J. Kapur
J. Sutton
Marco Bee
Massimo Riccaboni
P. Embrechts
R. Rubinstein
R. Serfling
S. P. Hubbell
Stefano Schiavo
W. Easterly
W. H. Greene
X. Gabaix
Y. Ijiri
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2011
Field of study

It is commonly found that distributions that seem to be lognormal over a broad range change to a power-law (Pareto) distribution for the last few percentiles. The distributions of many physical, natural, and social events (earthquake size, species abundance, income and wealth, as well as file, city, and firm sizes) display this structure. We present a test for the occurrence of power-law tails in statistical distributions based on maximum entropy. This methodology allows one to identify the true data-generating processes even in the case when it is neither lognormal nor Pareto. The maximum entropy approach is then compared with other widely used methods and applied to different levels of aggregation of complex systems. Our results provide support for the theory that distributions with lognormal body and Pareto tail can be generated as mixtures of lognormally distributed units

Lirias

Crossref

Archivio della ricerca della Scuola IMT Alti Studi Lucca

IMT Institutional Repository

Inference for double Pareto lognormal queues with applications

Author: Lillo Rodríguez Rosa Elvira
Ramírez Cobo Josefa
Wilson Simon P.
Wiper Michael Peter
Publication venue
Publication date: 01/02/2008
Field of study

In this article we describe a method for carrying out Bayesian inference for the double Pareto lognormal (dPlN) distribution which has recently been proposed as a model for heavy-tailed phenomena. We apply our approach to inference for the dPlN/M/1 and M/dPlN/1 queueing systems. These systems cannot be analyzed using standard techniques due to the fact that the dPlN distribution does not posses a Laplace transform in closed form. This difficulty is overcome using some recent approximations for the Laplace transform for the Pareto/M/1 system. Our procedure is illustrated with applications in internet traffic analysis and risk theory

Universidad Carlos III de Madrid e-Archivo

Sales Distribution of Consumer Electronics

Author: Auerbach
Buldyrev
Castillo
Castillo
Clauset
Estoup
Fu
Gibrat
Kapteyn
Klass
Koli
Lotka
Malevergne
Mitzenmacher
Mizuno
Mizuno
Pareto
Picoli
Ryohei Hisano
Saichev
Sakai
Sornette
Sutton
Takayasu
Takayuki Mizuno
Willinger
Yule
Zipf
Publication venue: 'Elsevier BV'
Publication date: 04/09/2010
Field of study

Using the uniform most powerful unbiased test, we observed the sales distribution of consumer electronics in Japan on a daily basis and report that it follows both a lognormal distribution and a power-law distribution and depends on the state of the market. We show that these switches occur quite often. The underlying sales dynamics found between both periods nicely matched a multiplicative process. However, even though the multiplicative term in the process displays a size-dependent relationship when a steady lognormal distribution holds, it shows a size-independent relationship when the power-law distribution holds. This difference in the underlying dynamics is responsible for the difference in the two observed distributions

arXiv.org e-Print Archive

Crossref

The skewness of computer science

Author: Franceschet Massimo
Publication venue
Publication date: 15/02/2010
Field of study

Computer science is a relatively young discipline combining science, engineering, and mathematics. The main flavors of computer science research involve the theoretical development of conceptual models for the different aspects of computing and the more applicative building of software artifacts and assessment of their properties. In the computer science publication culture, conferences are an important vehicle to quickly move ideas, and journals often publish deeper versions of papers already presented at conferences. These peculiarities of the discipline make computer science an original research field within the sciences, and, therefore, the assessment of classical bibliometric laws is particularly important for this field. In this paper, we study the skewness of the distribution of citations to papers published in computer science publication venues (journals and conferences). We find that the skewness in the distribution of mean citedness of different venues combines with the asymmetry in citedness of articles in each venue, resulting in a highly asymmetric citation distribution with a power law tail. Furthermore, the skewness of conference publications is more pronounced than the asymmetry of journal papers. Finally, the impact of journal papers, as measured with bibliometric indicators, largely dominates that of proceeding papers.Comment: I applied the goodness-of-fit methodology proposed in: A. Clauset, C. R. Shalizi, M. E. J. Newman. Power-law distributions in empirical data. SIAM Review 51, 661-703 (2009

arXiv.org e-Print Archive

CiteSeerX

Archivio istituzionale della ricerca - Università degli Studi di Udine

Inference for double Pareto lognormal queues with applications

Author: Michael P. Wiper
Pepa Ramirez
Rosa E. Lillo
Simon P. Wilson
Publication venue
Publication date
Field of study

Research Papers in Economics

What is the best spatial distribution to model base station density? A deep dive into two european mobile networks

Author: CHIARAVIGLIO LUCA
CUOMO Francesca
Gigli Andrea
Lorincz Josip
Maisto Maurizio
Qi Chen
Zhang Honggang
Zhao Zhifeng
Zhou Yifan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

This paper studies the base station (BS) spatial distributions across different scenarios in urban, rural, and coastal zones, based on real BS deployment data sets obtained from two European countries (i.e., Italy and Croatia). Basically, this paper takes into account different representative statistical distributions to characterize the probability density function of the BS spatial density, including Poisson, generalized Pareto, Weibull, lognormal, and \alpha -Stable. Based on a thorough comparison with real data sets, our results clearly assess that the \alpha -Stable distribution is the most accurate one among the other candidates in urban scenarios. This finding is confirmed across different sample area sizes, operators, and cellular technologies (GSM/UMTS/LTE). On the other hand, the lognormal and Weibull distributions tend to fit better the real ones in rural and coastal scenarios. We believe that the results of this paper can be exploited to derive fruitful guidelines for BS deployment in a cellular network design, providing various network performance metrics, such as coverage probability, transmission success probability, throughput, and delay

Crossref

Archivio della ricerca- Università di Roma La Sapienza

The size distribution of innovations revisited: an application of extreme value statistics to citation and value measures of patent significance

Author: Silverberg Gerald
Verspagen Bart
Publication venue
Publication date
Field of study

This paper focuses on the analysis of size distributions of innovations, which are known to be highly skewed. We use patent citations as one indicator of innovation significance, constructing two large datasets from the European and US Patent Offices at a high level of aggregation, and the Trajtenberg (1990) dataset on CT scanners at a very low one. We also study self-assessed reports of patented innovation values using two very recent patent valuation datasets from the Netherlands and the UK, as well as a small dataset of patent license revenues of Harvard University. Statistical methods are applied to analyse the properties of the empirical size distributions, where we put special emphasis on testing for the existence of ‘heavy tails’, i.e., whether or not the probability of very large innovations declines more slowly than exponentially. While overall the distributions appear to resemble a lognormal, we argue that the tails are indeed fat. We invoke some recent results from extreme value statistics and apply the Hill (1975) estimator with data-driven cut-offs to determine the tail index for the right tails of all datasets except the NL and UK patent valuations. On these latter datasets we use a maximum likelihood estimator for grouped data to estimate the Pareto exponent for varying definitions of the right tail. We find significantly and consistently lower tail estimates for the returns data than the citation data (around 0.7 vs. 3-5). The EPO and US patent citation tail indices are roughly constant over time (although the US one does grow somewhat in the last periods) but the latter estimates are significantly lower than the former. The heaviness of the tails, particularly as measured by financial indices, we argue, has significant implications for technology policy and growth theory, since the second and possibly even the first moments of these distributions may not exist. (JEL Codes: C16, O31, O33 Keywords: returns to invention, patent citations, extreme-value statistics, skewed distributions, heavy tails.)mathematical economics and econometrics ;

Research Papers in Economics