5,673 research outputs found

    Power-law Distributions in Information Science - Making the Case for Logarithmic Binning

    Full text link
    We suggest partial logarithmic binning as the method of choice for uncovering the nature of many distributions encountered in information science (IS). Logarithmic binning retrieves information and trends "not visible" in noisy power-law tails. We also argue that obtaining the exponent from logarithmically binned data using a simple least square method is in some cases warranted in addition to methods such as the maximum likelihood. We also show why often used cumulative distributions can make it difficult to distinguish noise from genuine features, and make it difficult to obtain an accurate power-law exponent of the underlying distribution. The treatment is non-technical, aimed at IS researchers with little or no background in mathematics.Comment: Accepted for publication in JASIS

    Power-law distributions in binned empirical data

    Full text link
    Many man-made and natural phenomena, including the intensity of earthquakes, population of cities and size of international wars, are believed to follow power-law distributions. The accurate identification of power-law patterns has significant consequences for correctly understanding and modeling complex systems. However, statistical evidence for or against the power-law hypothesis is complicated by large fluctuations in the empirical distribution's tail, and these are worsened when information is lost from binning the data. We adapt the statistically principled framework for testing the power-law hypothesis, developed by Clauset, Shalizi and Newman, to the case of binned data. This approach includes maximum-likelihood fitting, a hypothesis test based on the Kolmogorov--Smirnov goodness-of-fit statistic and likelihood ratio tests for comparing against alternative explanations. We evaluate the effectiveness of these methods on synthetic binned data with known structure, quantify the loss of statistical power due to binning, and apply the methods to twelve real-world binned data sets with heavy-tailed patterns.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS710 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    How people make friends in social networking sites - A microscopic perspective

    Full text link
    We study the detailed growth of a social networking site with full temporal information by examining the creation process of each friendship relation that can collectively lead to the macroscopic properties of the network. We first study the reciprocal behavior of users, and find that link requests are quickly responded to and that the distribution of reciprocation intervals decays in an exponential form. The degrees of inviters/accepters are slightly negatively correlative with reciprocation time. In addition, the temporal feature of the online community shows that the distributions of intervals of user behaviors, such as sending or accepting link requests, follow a power law with a universal exponent, and peaks emerge for intervals of an integral day. We finally study the preferential selection and linking phenomena of the social networking site and find that, for the former, a linear preference holds for preferential sending and reception, and for the latter, a linear preference also holds for preferential acceptance, creation, and attachment. Based on the linearly preferential linking, we put forward an analyzable network model which can reproduce the degree distribution of the network. The research framework presented in the paper could provide a potential insight into how the micro-motives of users lead to the global structure of online social networks.Comment: 10 pages, 12 figures, 2 table

    Fibonacci Binning

    Full text link
    This note argues that when dot-plotting distributions typically found in papers about web and social networks (degree distributions, component-size distributions, etc.), and more generally distributions that have high variability in their tail, an exponentially binned version should always be plotted, too, and suggests Fibonacci binning as a visually appealing, easy-to-use and practical choice

    Power laws, Pareto distributions and Zipf's law

    Full text link
    When the probability of measuring a particular value of some quantity varies inversely as a power of that value, the quantity is said to follow a power law, also known variously as Zipf's law or the Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. For instance, the distributions of the sizes of cities, earthquakes, solar flares, moon craters, wars and people's personal fortunes all appear to follow power laws. The origin of power-law behaviour has been a topic of debate in the scientific community for more than a century. Here we review some of the empirical evidence for the existence of power-law forms and the theories proposed to explain them.Comment: 28 pages, 16 figures, minor corrections and additions in this versio

    Zipf law in the popularity distribution of chess openings

    Full text link
    We perform a quantitative analysis of extensive chess databases and show that the frequencies of opening moves are distributed according to a power-law with an exponent that increases linearly with the game depth, whereas the pooled distribution of all opening weights follows Zipf's law with universal exponent. We propose a simple stochastic process that is able to capture the observed playing statistics and show that the Zipf law arises from the self-similar nature of the game tree of chess. Thus, in the case of hierarchical fragmentation the scaling is truly universal and independent of a particular generating mechanism. Our findings are of relevance in general processes with composite decisions.Comment: 5 pages, 4 figure

    Universal features of correlated bursty behaviour

    Get PDF
    Inhomogeneous temporal processes, like those appearing in human communications, neuron spike trains, and seismic signals, consist of high-activity bursty intervals alternating with long low-activity periods. In recent studies such bursty behavior has been characterized by a fat-tailed inter-event time distribution, while temporal correlations were measured by the autocorrelation function. However, these characteristic functions are not capable to fully characterize temporally correlated heterogenous behavior. Here we show that the distribution of the number of events in a bursty period serves as a good indicator of the dependencies, leading to the universal observation of power-law distribution in a broad class of phenomena. We find that the correlations in these quite different systems can be commonly interpreted by memory effects and described by a simple phenomenological model, which displays temporal behavior qualitatively similar to that in real systems
    corecore