1,612 research outputs found

    More "normal" than normal: scaling distributions and complex systems

    Get PDF
    One feature of many naturally occurring or engineered complex systems is tremendous variability in event sizes. To account for it, the behavior of these systems is often described using power law relationships or scaling distributions, which tend to be viewed as "exotic" because of their unusual properties (e.g., infinite moments). An alternate view is based on mathematical, statistical, and data-analytic arguments and suggests that scaling distributions should be viewed as "more normal than normal". In support of this latter view that has been advocated by Mandelbrot for the last 40 years, we review in this paper some relevant results from probability theory and illustrate a powerful statistical approach for deciding whether the variability associated with observed event sizes is consistent with an underlying Gaussian-type (finite variance) or scaling-type (infinite variance) distribution. We contrast this approach with traditional model fitting techniques and discuss its implications for future modeling of complex systems

    Two Universality Properties Associated with the Monkey Model of Zipf's Law

    Full text link
    The distribution of word probabilities in the monkey model of Zipf's law is associated with two universality properties: (1) the power law exponent converges strongly to 1-1 as the alphabet size increases and the letter probabilities are specified as the spacings from a random division of the unit interval for any distribution with a bounded density function on [0,1][0,1]; and (2), on a logarithmic scale the version of the model with a finite word length cutoff and unequal letter probabilities is approximately normally distributed in the part of the distribution away from the tails. The first property is proved using a remarkably general limit theorem for the logarithm of sample spacings from Shao and Hahn, and the second property follows from Anscombe's central limit theorem for a random number of i.i.d. random variables. The finite word length model leads to a hybrid Zipf-lognormal mixture distribution closely related to work in other areas.Comment: 14 pages, 3 figure

    Pareto versus lognormal: a maximum entropy test

    Get PDF
    It is commonly found that distributions that seem to be lognormal over a broad range change to a power-law (Pareto) distribution for the last few percentiles. The distributions of many physical, natural, and social events (earthquake size, species abundance, income and wealth, as well as file, city, and firm sizes) display this structure. We present a test for the occurrence of power-law tails in statistical distributions based on maximum entropy. This methodology allows one to identify the true data-generating processes even in the case when it is neither lognormal nor Pareto. The maximum entropy approach is then compared with other widely used methods and applied to different levels of aggregation of complex systems. Our results provide support for the theory that distributions with lognormal body and Pareto tail can be generated as mixtures of lognormally distributed units

    Inference for double Pareto lognormal queues with applications

    Get PDF
    In this article we describe a method for carrying out Bayesian inference for the double Pareto lognormal (dPlN) distribution which has recently been proposed as a model for heavy-tailed phenomena. We apply our approach to inference for the dPlN/M/1 and M/dPlN/1 queueing systems. These systems cannot be analyzed using standard techniques due to the fact that the dPlN distribution does not posses a Laplace transform in closed form. This difficulty is overcome using some recent approximations for the Laplace transform for the Pareto/M/1 system. Our procedure is illustrated with applications in internet traffic analysis and risk theory

    Sales Distribution of Consumer Electronics

    Full text link
    Using the uniform most powerful unbiased test, we observed the sales distribution of consumer electronics in Japan on a daily basis and report that it follows both a lognormal distribution and a power-law distribution and depends on the state of the market. We show that these switches occur quite often. The underlying sales dynamics found between both periods nicely matched a multiplicative process. However, even though the multiplicative term in the process displays a size-dependent relationship when a steady lognormal distribution holds, it shows a size-independent relationship when the power-law distribution holds. This difference in the underlying dynamics is responsible for the difference in the two observed distributions

    The skewness of computer science

    Full text link
    Computer science is a relatively young discipline combining science, engineering, and mathematics. The main flavors of computer science research involve the theoretical development of conceptual models for the different aspects of computing and the more applicative building of software artifacts and assessment of their properties. In the computer science publication culture, conferences are an important vehicle to quickly move ideas, and journals often publish deeper versions of papers already presented at conferences. These peculiarities of the discipline make computer science an original research field within the sciences, and, therefore, the assessment of classical bibliometric laws is particularly important for this field. In this paper, we study the skewness of the distribution of citations to papers published in computer science publication venues (journals and conferences). We find that the skewness in the distribution of mean citedness of different venues combines with the asymmetry in citedness of articles in each venue, resulting in a highly asymmetric citation distribution with a power law tail. Furthermore, the skewness of conference publications is more pronounced than the asymmetry of journal papers. Finally, the impact of journal papers, as measured with bibliometric indicators, largely dominates that of proceeding papers.Comment: I applied the goodness-of-fit methodology proposed in: A. Clauset, C. R. Shalizi, M. E. J. Newman. Power-law distributions in empirical data. SIAM Review 51, 661-703 (2009

    Inference for double Pareto lognormal queues with applications

    Get PDF
    In this article we describe a method for carrying out Bayesian inference for the double Pareto lognormal (dPlN) distribution which has recently been proposed as a model for heavy-tailed phenomena. We apply our approach to inference for the dPlN/M/1 and M/dPlN/1 queueing systems. These systems cannot be analyzed using standard techniques due to the fact that the dPlN distribution does not posses a Laplace transform in closed form. This difficulty is overcome using some recent approximations for the Laplace transform for the Pareto/M/1 system. Our procedure is illustrated with applications in internet traffic analysis and risk theory.Heavy tails, Bayesian inference, Queueing theory

    What is the best spatial distribution to model base station density? A deep dive into two european mobile networks

    Get PDF
    This paper studies the base station (BS) spatial distributions across different scenarios in urban, rural, and coastal zones, based on real BS deployment data sets obtained from two European countries (i.e., Italy and Croatia). Basically, this paper takes into account different representative statistical distributions to characterize the probability density function of the BS spatial density, including Poisson, generalized Pareto, Weibull, lognormal, and \alpha -Stable. Based on a thorough comparison with real data sets, our results clearly assess that the \alpha -Stable distribution is the most accurate one among the other candidates in urban scenarios. This finding is confirmed across different sample area sizes, operators, and cellular technologies (GSM/UMTS/LTE). On the other hand, the lognormal and Weibull distributions tend to fit better the real ones in rural and coastal scenarios. We believe that the results of this paper can be exploited to derive fruitful guidelines for BS deployment in a cellular network design, providing various network performance metrics, such as coverage probability, transmission success probability, throughput, and delay

    The size distribution of innovations revisited: an application of extreme value statistics to citation and value measures of patent significance

    Get PDF
    This paper focuses on the analysis of size distributions of innovations, which are known to be highly skewed. We use patent citations as one indicator of innovation significance, constructing two large datasets from the European and US Patent Offices at a high level of aggregation, and the Trajtenberg (1990) dataset on CT scanners at a very low one. We also study self-assessed reports of patented innovation values using two very recent patent valuation datasets from the Netherlands and the UK, as well as a small dataset of patent license revenues of Harvard University. Statistical methods are applied to analyse the properties of the empirical size distributions, where we put special emphasis on testing for the existence of ‘heavy tails’, i.e., whether or not the probability of very large innovations declines more slowly than exponentially. While overall the distributions appear to resemble a lognormal, we argue that the tails are indeed fat. We invoke some recent results from extreme value statistics and apply the Hill (1975) estimator with data-driven cut-offs to determine the tail index for the right tails of all datasets except the NL and UK patent valuations. On these latter datasets we use a maximum likelihood estimator for grouped data to estimate the Pareto exponent for varying definitions of the right tail. We find significantly and consistently lower tail estimates for the returns data than the citation data (around 0.7 vs. 3-5). The EPO and US patent citation tail indices are roughly constant over time (although the US one does grow somewhat in the last periods) but the latter estimates are significantly lower than the former. The heaviness of the tails, particularly as measured by financial indices, we argue, has significant implications for technology policy and growth theory, since the second and possibly even the first moments of these distributions may not exist. (JEL Codes: C16, O31, O33 Keywords: returns to invention, patent citations, extreme-value statistics, skewed distributions, heavy tails.)mathematical economics and econometrics ;
    corecore