1,921 research outputs found
Power-law distributions in empirical data
Power-law distributions occur in many situations of scientific interest and
have significant consequences for our understanding of natural and man-made
phenomena. Unfortunately, the detection and characterization of power laws is
complicated by the large fluctuations that occur in the tail of the
distribution -- the part of the distribution representing large but rare events
-- and by the difficulty of identifying the range over which power-law behavior
holds. Commonly used methods for analyzing power-law data, such as
least-squares fitting, can produce substantially inaccurate estimates of
parameters for power-law distributions, and even in cases where such methods
return accurate answers they are still unsatisfactory because they give no
indication of whether the data obey a power law at all. Here we present a
principled statistical framework for discerning and quantifying power-law
behavior in empirical data. Our approach combines maximum-likelihood fitting
methods with goodness-of-fit tests based on the Kolmogorov-Smirnov statistic
and likelihood ratios. We evaluate the effectiveness of the approach with tests
on synthetic data and give critical comparisons to previous approaches. We also
apply the proposed methods to twenty-four real-world data sets from a range of
different disciplines, each of which has been conjectured to follow a power-law
distribution. In some cases we find these conjectures to be consistent with the
data while in others the power law is ruled out.Comment: 43 pages, 11 figures, 7 tables, 4 appendices; code available at
http://www.santafe.edu/~aaronc/powerlaws
The skewness of computer science
Computer science is a relatively young discipline combining science,
engineering, and mathematics. The main flavors of computer science research
involve the theoretical development of conceptual models for the different
aspects of computing and the more applicative building of software artifacts
and assessment of their properties. In the computer science publication
culture, conferences are an important vehicle to quickly move ideas, and
journals often publish deeper versions of papers already presented at
conferences. These peculiarities of the discipline make computer science an
original research field within the sciences, and, therefore, the assessment of
classical bibliometric laws is particularly important for this field. In this
paper, we study the skewness of the distribution of citations to papers
published in computer science publication venues (journals and conferences). We
find that the skewness in the distribution of mean citedness of different
venues combines with the asymmetry in citedness of articles in each venue,
resulting in a highly asymmetric citation distribution with a power law tail.
Furthermore, the skewness of conference publications is more pronounced than
the asymmetry of journal papers. Finally, the impact of journal papers, as
measured with bibliometric indicators, largely dominates that of proceeding
papers.Comment: I applied the goodness-of-fit methodology proposed in: A. Clauset, C.
R. Shalizi, M. E. J. Newman. Power-law distributions in empirical data. SIAM
Review 51, 661-703 (2009
The Effect of Recency to Human Mobility
In recent years, we have seen scientists attempt to model and explain human
dynamics and, in particular, human movement. Many aspects of our complex life
are affected by human movements such as disease spread and epidemics modeling,
city planning, wireless network development, and disaster relief, to name a
few. Given the myriad of applications it is clear that a complete understanding
of how people move in space can lead to huge benefits to our society. In most
of the recent works, scientists have focused on the idea that people movements
are biased towards frequently-visited locations. According to them, human
movement is based on an exploration/exploitation dichotomy in which individuals
choose new locations (exploration) or return to frequently-visited locations
(exploitation). In this work, we focus on the concept of recency. We propose a
model in which exploitation in human movement also considers recently-visited
locations and not solely frequently-visited locations. We test our hypothesis
against different empirical data of human mobility and show that our proposed
model is able to better explain the human trajectories in these datasets
Universal fractal scaling of self-organized networks
There is an abundance of literature on complex networks describing a variety of relationships among units in social, biological, and technological systems. Such networks, consisting of interconnected nodes, are often self-organized, naturally emerging without any overarching designs on topological structure yet enabling efficient interactions among nodes. Here we show that the number of nodes and the density of connections in such self-organized networks exhibit a power law relationship. We examined the size and connection density of 46 self-organizing networks of various biological, social, and technological origins, and found that the size-density relationship follows a fractal relationship spanning over 6 orders of magnitude. This finding indicates that there is an optimal connection density in self-organized networks following fractal scaling regardless of their sizes
Fibonacci Binning
This note argues that when dot-plotting distributions typically found in
papers about web and social networks (degree distributions, component-size
distributions, etc.), and more generally distributions that have high
variability in their tail, an exponentially binned version should always be
plotted, too, and suggests Fibonacci binning as a visually appealing,
easy-to-use and practical choice
- …