5,673 research outputs found
Power-law Distributions in Information Science - Making the Case for Logarithmic Binning
We suggest partial logarithmic binning as the method of choice for uncovering
the nature of many distributions encountered in information science (IS).
Logarithmic binning retrieves information and trends "not visible" in noisy
power-law tails. We also argue that obtaining the exponent from logarithmically
binned data using a simple least square method is in some cases warranted in
addition to methods such as the maximum likelihood. We also show why often used
cumulative distributions can make it difficult to distinguish noise from
genuine features, and make it difficult to obtain an accurate power-law
exponent of the underlying distribution. The treatment is non-technical, aimed
at IS researchers with little or no background in mathematics.Comment: Accepted for publication in JASIS
Power-law distributions in binned empirical data
Many man-made and natural phenomena, including the intensity of earthquakes,
population of cities and size of international wars, are believed to follow
power-law distributions. The accurate identification of power-law patterns has
significant consequences for correctly understanding and modeling complex
systems. However, statistical evidence for or against the power-law hypothesis
is complicated by large fluctuations in the empirical distribution's tail, and
these are worsened when information is lost from binning the data. We adapt the
statistically principled framework for testing the power-law hypothesis,
developed by Clauset, Shalizi and Newman, to the case of binned data. This
approach includes maximum-likelihood fitting, a hypothesis test based on the
Kolmogorov--Smirnov goodness-of-fit statistic and likelihood ratio tests for
comparing against alternative explanations. We evaluate the effectiveness of
these methods on synthetic binned data with known structure, quantify the loss
of statistical power due to binning, and apply the methods to twelve real-world
binned data sets with heavy-tailed patterns.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS710 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
How people make friends in social networking sites - A microscopic perspective
We study the detailed growth of a social networking site with full temporal
information by examining the creation process of each friendship relation that
can collectively lead to the macroscopic properties of the network. We first
study the reciprocal behavior of users, and find that link requests are quickly
responded to and that the distribution of reciprocation intervals decays in an
exponential form. The degrees of inviters/accepters are slightly negatively
correlative with reciprocation time. In addition, the temporal feature of the
online community shows that the distributions of intervals of user behaviors,
such as sending or accepting link requests, follow a power law with a universal
exponent, and peaks emerge for intervals of an integral day. We finally study
the preferential selection and linking phenomena of the social networking site
and find that, for the former, a linear preference holds for preferential
sending and reception, and for the latter, a linear preference also holds for
preferential acceptance, creation, and attachment. Based on the linearly
preferential linking, we put forward an analyzable network model which can
reproduce the degree distribution of the network. The research framework
presented in the paper could provide a potential insight into how the
micro-motives of users lead to the global structure of online social networks.Comment: 10 pages, 12 figures, 2 table
Fibonacci Binning
This note argues that when dot-plotting distributions typically found in
papers about web and social networks (degree distributions, component-size
distributions, etc.), and more generally distributions that have high
variability in their tail, an exponentially binned version should always be
plotted, too, and suggests Fibonacci binning as a visually appealing,
easy-to-use and practical choice
Power laws, Pareto distributions and Zipf's law
When the probability of measuring a particular value of some quantity varies
inversely as a power of that value, the quantity is said to follow a power law,
also known variously as Zipf's law or the Pareto distribution. Power laws
appear widely in physics, biology, earth and planetary sciences, economics and
finance, computer science, demography and the social sciences. For instance,
the distributions of the sizes of cities, earthquakes, solar flares, moon
craters, wars and people's personal fortunes all appear to follow power laws.
The origin of power-law behaviour has been a topic of debate in the scientific
community for more than a century. Here we review some of the empirical
evidence for the existence of power-law forms and the theories proposed to
explain them.Comment: 28 pages, 16 figures, minor corrections and additions in this versio
Zipf law in the popularity distribution of chess openings
We perform a quantitative analysis of extensive chess databases and show that
the frequencies of opening moves are distributed according to a power-law with
an exponent that increases linearly with the game depth, whereas the pooled
distribution of all opening weights follows Zipf's law with universal exponent.
We propose a simple stochastic process that is able to capture the observed
playing statistics and show that the Zipf law arises from the self-similar
nature of the game tree of chess. Thus, in the case of hierarchical
fragmentation the scaling is truly universal and independent of a particular
generating mechanism. Our findings are of relevance in general processes with
composite decisions.Comment: 5 pages, 4 figure
Universal features of correlated bursty behaviour
Inhomogeneous temporal processes, like those appearing in human
communications, neuron spike trains, and seismic signals, consist of
high-activity bursty intervals alternating with long low-activity periods. In
recent studies such bursty behavior has been characterized by a fat-tailed
inter-event time distribution, while temporal correlations were measured by the
autocorrelation function. However, these characteristic functions are not
capable to fully characterize temporally correlated heterogenous behavior. Here
we show that the distribution of the number of events in a bursty period serves
as a good indicator of the dependencies, leading to the universal observation
of power-law distribution in a broad class of phenomena. We find that the
correlations in these quite different systems can be commonly interpreted by
memory effects and described by a simple phenomenological model, which displays
temporal behavior qualitatively similar to that in real systems
- …