1,612 research outputs found
More "normal" than normal: scaling distributions and complex systems
One feature of many naturally occurring or engineered complex systems is tremendous variability in event sizes. To account for it, the behavior of these systems is often described using power law relationships or scaling distributions, which tend to be viewed as "exotic" because of their unusual properties (e.g., infinite moments). An alternate view is based on mathematical, statistical, and data-analytic arguments and suggests that scaling distributions should be viewed as "more normal than normal". In support of this latter view that has been advocated by Mandelbrot for the last 40 years, we review in this paper some relevant results from probability theory and illustrate a powerful statistical approach for deciding whether the variability associated with observed event sizes is consistent with an underlying Gaussian-type (finite variance) or scaling-type (infinite variance) distribution. We contrast this approach with traditional model fitting techniques and discuss its implications for future modeling of complex systems
Two Universality Properties Associated with the Monkey Model of Zipf's Law
The distribution of word probabilities in the monkey model of Zipf's law is
associated with two universality properties: (1) the power law exponent
converges strongly to as the alphabet size increases and the letter
probabilities are specified as the spacings from a random division of the unit
interval for any distribution with a bounded density function on ; and
(2), on a logarithmic scale the version of the model with a finite word length
cutoff and unequal letter probabilities is approximately normally distributed
in the part of the distribution away from the tails. The first property is
proved using a remarkably general limit theorem for the logarithm of sample
spacings from Shao and Hahn, and the second property follows from Anscombe's
central limit theorem for a random number of i.i.d. random variables. The
finite word length model leads to a hybrid Zipf-lognormal mixture distribution
closely related to work in other areas.Comment: 14 pages, 3 figure
Pareto versus lognormal: a maximum entropy test
It is commonly found that distributions that seem to be lognormal over a broad range change to a power-law (Pareto) distribution for the last few percentiles. The distributions of many physical, natural, and social events (earthquake size, species abundance, income and wealth, as well as file, city, and firm sizes) display this structure. We present a test for the occurrence of power-law tails in statistical distributions based on maximum entropy. This methodology allows one to identify the true data-generating processes even in the case when it is neither lognormal nor Pareto. The maximum entropy approach is then compared with other widely used methods and applied to different levels of aggregation of complex systems. Our results provide support for the theory that distributions with lognormal body and Pareto tail can be generated as mixtures of lognormally distributed units
Inference for double Pareto lognormal queues with applications
In this article we describe a method for carrying out Bayesian inference for the double
Pareto lognormal (dPlN) distribution which has recently been proposed as a model for
heavy-tailed phenomena. We apply our approach to inference for the dPlN/M/1 and
M/dPlN/1 queueing systems. These systems cannot be analyzed using standard
techniques due to the fact that the dPlN distribution does not posses a Laplace transform
in closed form. This difficulty is overcome using some recent approximations for the
Laplace transform for the Pareto/M/1 system. Our procedure is illustrated with
applications in internet traffic analysis and risk theory
Sales Distribution of Consumer Electronics
Using the uniform most powerful unbiased test, we observed the sales
distribution of consumer electronics in Japan on a daily basis and report that
it follows both a lognormal distribution and a power-law distribution and
depends on the state of the market. We show that these switches occur quite
often. The underlying sales dynamics found between both periods nicely matched
a multiplicative process. However, even though the multiplicative term in the
process displays a size-dependent relationship when a steady lognormal
distribution holds, it shows a size-independent relationship when the power-law
distribution holds. This difference in the underlying dynamics is responsible
for the difference in the two observed distributions
The skewness of computer science
Computer science is a relatively young discipline combining science,
engineering, and mathematics. The main flavors of computer science research
involve the theoretical development of conceptual models for the different
aspects of computing and the more applicative building of software artifacts
and assessment of their properties. In the computer science publication
culture, conferences are an important vehicle to quickly move ideas, and
journals often publish deeper versions of papers already presented at
conferences. These peculiarities of the discipline make computer science an
original research field within the sciences, and, therefore, the assessment of
classical bibliometric laws is particularly important for this field. In this
paper, we study the skewness of the distribution of citations to papers
published in computer science publication venues (journals and conferences). We
find that the skewness in the distribution of mean citedness of different
venues combines with the asymmetry in citedness of articles in each venue,
resulting in a highly asymmetric citation distribution with a power law tail.
Furthermore, the skewness of conference publications is more pronounced than
the asymmetry of journal papers. Finally, the impact of journal papers, as
measured with bibliometric indicators, largely dominates that of proceeding
papers.Comment: I applied the goodness-of-fit methodology proposed in: A. Clauset, C.
R. Shalizi, M. E. J. Newman. Power-law distributions in empirical data. SIAM
Review 51, 661-703 (2009
Inference for double Pareto lognormal queues with applications
In this article we describe a method for carrying out Bayesian inference for the double Pareto lognormal (dPlN) distribution which has recently been proposed as a model for heavy-tailed phenomena. We apply our approach to inference for the dPlN/M/1 and M/dPlN/1 queueing systems. These systems cannot be analyzed using standard techniques due to the fact that the dPlN distribution does not posses a Laplace transform in closed form. This difficulty is overcome using some recent approximations for the Laplace transform for the Pareto/M/1 system. Our procedure is illustrated with applications in internet traffic analysis and risk theory.Heavy tails, Bayesian inference, Queueing theory
What is the best spatial distribution to model base station density? A deep dive into two european mobile networks
This paper studies the base station (BS) spatial distributions across different scenarios in urban, rural, and coastal zones, based on real BS deployment data sets obtained from two European countries (i.e., Italy and Croatia). Basically, this paper takes into account different representative statistical distributions to characterize the probability density function of the BS spatial density, including Poisson, generalized Pareto, Weibull, lognormal, and \alpha -Stable. Based on a thorough comparison with real data sets, our results clearly assess that the \alpha -Stable distribution is the most accurate one among the other candidates in urban scenarios. This finding is confirmed across different sample area sizes, operators, and cellular technologies (GSM/UMTS/LTE). On the other hand, the lognormal and Weibull distributions tend to fit better the real ones in rural and coastal scenarios. We believe that the results of this paper can be exploited to derive fruitful guidelines for BS deployment in a cellular network design, providing various network performance metrics, such as coverage probability, transmission success probability, throughput, and delay
The size distribution of innovations revisited: an application of extreme value statistics to citation and value measures of patent significance
This paper focuses on the analysis of size distributions of innovations, which are known to be highly skewed. We use patent citations as one indicator of innovation significance, constructing two large datasets from the European and US Patent Offices at a high level of aggregation, and the Trajtenberg (1990) dataset on CT scanners at a very low one. We also study self-assessed reports of patented innovation values using two very recent patent valuation datasets from the Netherlands and the UK, as well as a small dataset of patent license revenues of Harvard University. Statistical methods are applied to analyse the properties of the empirical size distributions, where we put special emphasis on testing for the existence of ‘heavy tails’, i.e., whether or not the probability of very large innovations declines more slowly than exponentially. While overall the distributions appear to resemble a lognormal, we argue that the tails are indeed fat. We invoke some recent results from extreme value statistics and apply the Hill (1975) estimator with data-driven cut-offs to determine the tail index for the right tails of all datasets except the NL and UK patent valuations. On these latter datasets we use a maximum likelihood estimator for grouped data to estimate the Pareto exponent for varying definitions of the right tail. We find significantly and consistently lower tail estimates for the returns data than the citation data (around 0.7 vs. 3-5). The EPO and US patent citation tail indices are roughly constant over time (although the US one does grow somewhat in the last periods) but the latter estimates are significantly lower than the former. The heaviness of the tails, particularly as measured by financial indices, we argue, has significant implications for technology policy and growth theory, since the second and possibly even the first moments of these distributions may not exist. (JEL Codes: C16, O31, O33 Keywords: returns to invention, patent citations, extreme-value statistics, skewed distributions, heavy tails.)mathematical economics and econometrics ;
- …