179 research outputs found

    Power laws, Pareto distributions and Zipf's law

    Full text link
    When the probability of measuring a particular value of some quantity varies inversely as a power of that value, the quantity is said to follow a power law, also known variously as Zipf's law or the Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. For instance, the distributions of the sizes of cities, earthquakes, solar flares, moon craters, wars and people's personal fortunes all appear to follow power laws. The origin of power-law behaviour has been a topic of debate in the scientific community for more than a century. Here we review some of the empirical evidence for the existence of power-law forms and the theories proposed to explain them.Comment: 28 pages, 16 figures, minor corrections and additions in this versio

    Handling oversampling in dynamic networks using link prediction

    Full text link
    Oversampling is a common characteristic of data representing dynamic networks. It introduces noise into representations of dynamic networks, but there has been little work so far to compensate for it. Oversampling can affect the quality of many important algorithmic problems on dynamic networks, including link prediction. Link prediction seeks to predict edges that will be added to the network given previous snapshots. We show that not only does oversampling affect the quality of link prediction, but that we can use link prediction to recover from the effects of oversampling. We also introduce a novel generative model of noise in dynamic networks that represents oversampling. We demonstrate the results of our approach on both synthetic and real-world data.Comment: ECML/PKDD 201

    Evolutionary dynamics of the cryptocurrency market

    Get PDF
    The cryptocurrency market surpassed the barrier of $100 billion market capitalization in June 2017, after months of steady growth. Despite its increasing relevance in the financial world, a comprehensive analysis of the whole system is still lacking, as most studies have focused exclusively on the behaviour of one (Bitcoin) or few cryptocurrencies. Here, we consider the history of the entire market and analyse the behaviour of 1469 cryptocurrencies introduced between April 2013 and May 2017. We reveal that, while new cryptocurrencies appear and disappear continuously and their market capitalization is increasing (super-)exponentially, several statistical properties of the market have been stable for years. These include the number of active cryptocurrencies, market share distribution and the turnover of cryptocurrencies. Adopting an ecological perspective, we show that the so-called neutral model of evolution is able to reproduce a number of key empirical observations, despite its simplicity and the assumption of no selective advantage of one cryptocurrency over another. Our results shed light on the properties of the cryptocurrency market and establish a first formal link between ecological modelling and the study of this growing system. We anticipate they will spark further research in this direction

    Measuring the evolution of contemporary western popular music

    Get PDF
    Popular music is a key cultural expression that has captured listeners' attention for ages. Many of the structural regularities underlying musical discourse are yet to be discovered and, accordingly, their historical evolution remains formally unknown. Here we unveil a number of patterns and metrics characterizing the generic usage of primary musical facets such as pitch, timbre, and loudness in contemporary western popular music. Many of these patterns and metrics have been consistently stable for a period of more than fifty years, thus pointing towards a great degree of conventionalism. Nonetheless, we prove important changes or trends related to the restriction of pitch transitions, the homogenization of the timbral palette, and the growing loudness levels. This suggests that our perception of the new would be rooted on these changing characteristics. Hence, an old tune could perfectly sound novel and fashionable, provided that it consisted of common harmonic progressions, changed the instrumentation, and increased the average loudness.Comment: Supplementary materials not included. Please see the journal reference or contact the author

    Error and attack tolerance of complex networks

    Full text link
    Many complex systems, such as communication networks, display a surprising degree of robustness: while key components regularly malfunction, local failures rarely lead to the loss of the global information-carrying ability of the network. The stability of these complex systems is often attributed to the redundant wiring of the functional web defined by the systems' components. In this paper we demonstrate that error tolerance is not shared by all redundant systems, but it is displayed only by a class of inhomogeneously wired networks, called scale-free networks. We find that scale-free networks, describing a number of systems, such as the World Wide Web, Internet, social networks or a cell, display an unexpected degree of robustness, the ability of their nodes to communicate being unaffected by even unrealistically high failure rates. However, error tolerance comes at a high price: these networks are extremely vulnerable to attacks, i.e. to the selection and removal of a few nodes that play the most important role in assuring the network's connectivity.Comment: 14 pages, 4 figures, Late

    Popularity versus Similarity in Growing Networks

    Full text link
    Popularity is attractive -- this is the formula underlying preferential attachment, a popular explanation for the emergence of scaling in growing networks. If new connections are made preferentially to more popular nodes, then the resulting distribution of the number of connections that nodes have follows power laws observed in many real networks. Preferential attachment has been directly validated for some real networks, including the Internet. Preferential attachment can also be a consequence of different underlying processes based on node fitness, ranking, optimization, random walks, or duplication. Here we show that popularity is just one dimension of attractiveness. Another dimension is similarity. We develop a framework where new connections, instead of preferring popular nodes, optimize certain trade-offs between popularity and similarity. The framework admits a geometric interpretation, in which popularity preference emerges from local optimization. As opposed to preferential attachment, the optimization framework accurately describes large-scale evolution of technological (Internet), social (web of trust), and biological (E.coli metabolic) networks, predicting the probability of new links in them with a remarkable precision. The developed framework can thus be used for predicting new links in evolving networks, and provides a different perspective on preferential attachment as an emergent phenomenon


    Get PDF
    Persistent homology is a powerful tool in Topological Data Analysis (TDA) to capture the topological properties of data succinctly at different spatial resolutions. For graphical data, the shape, and structure of the neighborhood of individual data items (nodes) are an essential means of characterizing their properties. We propose the use of persistent homology methods to capture structural and topological properties of graphs and use it to address the problem of link prediction. We achieve encouraging results on nine different real-world datasets that attest to the potential of persistent homology-based methods for network analysis

    Computational fact checking from knowledge networks

    Get PDF
    Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation

    Network 'small-world-ness': a quantitative method for determining canonical network equivalence

    Get PDF
    Background: Many technological, biological, social, and information networks fall into the broad class of 'small-world' networks: they have tightly interconnected clusters of nodes, and a shortest mean path length that is similar to a matched random graph (same number of nodes and edges). This semi-quantitative definition leads to a categorical distinction ('small/not-small') rather than a quantitative, continuous grading of networks, and can lead to uncertainty about a network's small-world status. Moreover, systems described by small-world networks are often studied using an equivalent canonical network model-the Watts-Strogatz (WS) model. However, the process of establishing an equivalent WS model is imprecise and there is a pressing need to discover ways in which this equivalence may be quantified. Methodology/Principal Findings: We defined a precise measure of 'small-world-ness' S based on the trade off between high local clustering and short path length. A network is now deemed a 'small-world' if S. 1-an assertion which may be tested statistically. We then examined the behavior of S on a large data-set of real-world systems. We found that all these systems were linked by a linear relationship between their S values and the network size n. Moreover, we show a method for assigning a unique Watts-Strogatz (WS) model to any real-world network, and show analytically that the WS models associated with our sample of networks also show linearity between S and n. Linearity between S and n is not, however, inevitable, and neither is S maximal for an arbitrary network of given size. Linearity may, however, be explained by a common limiting growth process. Conclusions/Significance: We have shown how the notion of a small-world network may be quantified. Several key properties of the metric are described and the use of WS canonical models is placed on a more secure footing

    Comparison of contact patterns relevant for transmission of respiratory pathogens in Thailand and the Netherlands using respondent-driven sampling

    No full text
    Understanding infection dynamics of respiratory diseases requires the identification and quantification of behavioural, social and environmental factors that permit the transmission of these infections between humans. Little empirical information is available about contact patterns within real-world social networks, let alone on differences in these contact networks between populations that differ considerably on a socio-cultural level. Here we compared contact network data that were collected in the Netherlands and Thailand using a similar online respondent-driven method. By asking participants to recruit contact persons we studied network links relevant for the transmission of respiratory infections. We studied correlations between recruiter and recruited contacts to investigate mixing patterns in the observed social network components. In both countries, mixing patterns were assortative by demographic variables and random by total numbers of contacts. However, in Thailand participants reported overall more contacts which resulted in higher effective contact rates. Our findings provide new insights on numbers of contacts and mixing patterns in two different populations. These data could be used to improve parameterisation of mathematical models used to design control strategies. Although the spread of infections through populations depends on more factors, found similarities suggest that spread may be similar in the Netherlands and Thailand