Search CORE

5,673 research outputs found

Power-law Distributions in Information Science - Making the Case for Logarithmic Binning

Author: Bookstein
Bookstein
Bradford
Burrell
Burrell
Clauset
Csányi
de Bellis
Egghe
Egghe
Glänzel
Golder
Leimkuhler
Lotka
Milojević
Newman
Nicolaisen
Pao
Price
Redner
Rousseau
Rousseau
Zipf
Zucker
Publication venue: 'Wiley'
Publication date: 05/11/2010
Field of study

We suggest partial logarithmic binning as the method of choice for uncovering the nature of many distributions encountered in information science (IS). Logarithmic binning retrieves information and trends "not visible" in noisy power-law tails. We also argue that obtaining the exponent from logarithmically binned data using a simple least square method is in some cases warranted in addition to methods such as the maximum likelihood. We also show why often used cumulative distributions can make it difficult to distinguish noise from genuine features, and make it difficult to obtain an accurate power-law exponent of the underlying distribution. The treatment is non-technical, aimed at IS researchers with little or no background in mathematics.Comment: Accepted for publication in JASIS

arXiv.org e-Print Archive

Crossref

Power-law distributions in binned empirical data

Author: Clauset Aaron
Virkar Yogesh
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 14/04/2014
Field of study

Many man-made and natural phenomena, including the intensity of earthquakes, population of cities and size of international wars, are believed to follow power-law distributions. The accurate identification of power-law patterns has significant consequences for correctly understanding and modeling complex systems. However, statistical evidence for or against the power-law hypothesis is complicated by large fluctuations in the empirical distribution's tail, and these are worsened when information is lost from binning the data. We adapt the statistically principled framework for testing the power-law hypothesis, developed by Clauset, Shalizi and Newman, to the case of binned data. This approach includes maximum-likelihood fitting, a hypothesis test based on the Kolmogorov--Smirnov goodness-of-fit statistic and likelihood ratio tests for comparing against alternative explanations. We evaluate the effectiveness of these methods on synthetic binned data with known structure, quantify the loss of statistical power due to binning, and apply the methods to twelve real-world binned data sets with heavy-tailed patterns.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS710 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

How people make friends in social networking sites - A microscopic perspective

Author: Ahn
Bainbridge
Barabási
Barabási
Chun
Dunbar
Haibo Hu
Holme
Hu
Hu
Jeong
Leskovec
Lewis
Licoppe
Lü
Mislove
Newman
Onnela
Shadbolt
Simon
Szell
Vázquez
Wellman
Xiaofan Wang
Publication venue: 'Elsevier BV'
Publication date: 23/11/2011
Field of study

We study the detailed growth of a social networking site with full temporal information by examining the creation process of each friendship relation that can collectively lead to the macroscopic properties of the network. We first study the reciprocal behavior of users, and find that link requests are quickly responded to and that the distribution of reciprocation intervals decays in an exponential form. The degrees of inviters/accepters are slightly negatively correlative with reciprocation time. In addition, the temporal feature of the online community shows that the distributions of intervals of user behaviors, such as sending or accepting link requests, follow a power law with a universal exponent, and peaks emerge for intervals of an integral day. We finally study the preferential selection and linking phenomena of the social networking site and find that, for the former, a linear preference holds for preferential sending and reception, and for the latter, a linear preference also holds for preferential acceptance, creation, and attachment. Based on the linearly preferential linking, we put forward an analyzable network model which can reproduce the degree distribution of the network. The research framework presented in the paper could provide a potential insight into how the micro-motives of users lead to the global structure of online social networks.Comment: 10 pages, 12 figures, 2 table

arXiv.org e-Print Archive

CiteSeerX

Crossref

Fibonacci Binning

Author: Vigna Sebastiano
Publication venue
Publication date: 22/02/2014
Field of study

This note argues that when dot-plotting distributions typically found in papers about web and social networks (degree distributions, component-size distributions, etc.), and more generally distributions that have high variability in their tail, an exponentially binned version should always be plotted, too, and suggests Fibonacci binning as a visually appealing, easy-to-use and practical choice

arXiv.org e-Print Archive

CiteSeerX

Power laws, Pareto distributions and Zipf's law

Author: Adamic LA
Auerbach F
Gutenberg B
Lotka AJ
MEJ Newman
Shannon CE
Shannon CE
Simon HA
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2005
Field of study

When the probability of measuring a particular value of some quantity varies inversely as a power of that value, the quantity is said to follow a power law, also known variously as Zipf's law or the Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. For instance, the distributions of the sizes of cities, earthquakes, solar flares, moon craters, wars and people's personal fortunes all appear to follow power laws. The origin of power-law behaviour has been a topic of debate in the scientific community for more than a century. Here we review some of the empirical evidence for the existence of power-law forms and the theories proposed to explain them.Comment: 28 pages, 16 figures, minor corrections and additions in this versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Zipf law in the popularity distribution of chess openings

Author: Bernd Blasius
C. Anderson
C. E. Shannon
D. Sornette
G. K. Zipf
H. Simon
H. J. R. Murray
I. L. Janis
M. Mitzenmacher
Ralf Tönjes
V. Pareto
W. Feller
Publication venue: 'American Physical Society (APS)'
Publication date: 18/11/2009
Field of study

We perform a quantitative analysis of extensive chess databases and show that the frequencies of opening moves are distributed according to a power-law with an exponent that increases linearly with the game depth, whereas the pooled distribution of all opening weights follows Zipf's law with universal exponent. We propose a simple stochastic process that is able to capture the observed playing statistics and show that the Zipf law arises from the self-similar nature of the game tree of chess. Thus, in the case of hierarchical fragmentation the scaling is truly universal and independent of a particular generating mechanism. Our findings are of relevance in general processes with composite decisions.Comment: 5 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Universal features of correlated bursty behaviour

Author: A Bunde
A Helmstetter
A Kepecs
A Saichev
A Udias
A Vázquez
A-L Barabási
AA Grace
C Anteneodo
C Cattuto
E Lippiello
F Omori
GG Brunk
H-H Jo
J Eckmann
J Kleinberg
J Ratkiewicz
J Stehlé
JG Oliveira
JM Beggs
JR Busemeyer
K Zhao
K-I Goh
L de Arcangelis
L Turnbull
M Karsai
M Paczuski
M Pica Ciamarra
MS Wheatland
N Takahashi
P Bak
RD Malmgren
RF Smalley
RT Ramos
S Zapperi
T Kemuriyama
T Takaguchi
T Utsu
VN Livina
X Zhao
Y Ikegaya
Y Wu
Á Corral
Publication venue
Publication date: 30/11/2011
Field of study

Inhomogeneous temporal processes, like those appearing in human communications, neuron spike trains, and seismic signals, consist of high-activity bursty intervals alternating with long low-activity periods. In recent studies such bursty behavior has been characterized by a fat-tailed inter-event time distribution, while temporal correlations were measured by the autocorrelation function. However, these characteristic functions are not capable to fully characterize temporally correlated heterogenous behavior. Here we show that the distribution of the number of events in a bursty period serves as a good indicator of the dependencies, leading to the universal observation of power-law distribution in a broad class of phenomena. We find that the correlations in these quite different systems can be commonly interpreted by memory effects and described by a simple phenomenological model, which displays temporal behavior qualitatively similar to that in real systems

arXiv.org e-Print Archive

Crossref

Harvard University - DASH

PubMed Central

Aaltodoc Publication Archive