426 research outputs found

    Navigability is a Robust Property

    Full text link
    The Small World phenomenon has inspired researchers across a number of fields. A breakthrough in its understanding was made by Kleinberg who introduced Rank Based Augmentation (RBA): add to each vertex independently an arc to a random destination selected from a carefully crafted probability distribution. Kleinberg proved that RBA makes many networks navigable, i.e., it allows greedy routing to successfully deliver messages between any two vertices in a polylogarithmic number of steps. We prove that navigability is an inherent property of many random networks, arising without coordination, or even independence assumptions

    The evolution of interdisciplinarity in physics research

    Get PDF
    Science, being a social enterprise, is subject to fragmentation into groups that focus on specialized areas or topics. Often new advances occur through cross-fertilization of ideas between sub-fields that otherwise have little overlap as they study dissimilar phenomena using different techniques. Thus to explore the nature and dynamics of scientific progress one needs to consider the large-scale organization and interactions between different subject areas. Here, we study the relationships between the sub-fields of Physics using the Physics and Astronomy Classification Scheme (PACS) codes employed for self-categorization of articles published over the past 25 years (1985-2009). We observe a clear trend towards increasing interactions between the different sub-fields. The network of sub-fields also exhibits core-periphery organization, the nucleus being dominated by Condensed Matter and General Physics. However, over time Interdisciplinary Physics is steadily increasing its share in the network core, reflecting a shift in the overall trend of Physics research.Comment: Published version, 10 pages, 8 figures + Supplementary Informatio

    Risk-Averse Matchings over Uncertain Graph Databases

    Full text link
    A large number of applications such as querying sensor networks, and analyzing protein-protein interaction (PPI) networks, rely on mining uncertain graph and hypergraph databases. In this work we study the following problem: given an uncertain, weighted (hyper)graph, how can we efficiently find a (hyper)matching with high expected reward, and low risk? This problem naturally arises in the context of several important applications, such as online dating, kidney exchanges, and team formation. We introduce a novel formulation for finding matchings with maximum expected reward and bounded risk under a general model of uncertain weighted (hyper)graphs that we introduce in this work. Our model generalizes probabilistic models used in prior work, and captures both continuous and discrete probability distributions, thus allowing to handle privacy related applications that inject appropriately distributed noise to (hyper)edge weights. Given that our optimization problem is NP-hard, we turn our attention to designing efficient approximation algorithms. For the case of uncertain weighted graphs, we provide a 13\frac{1}{3}-approximation algorithm, and a 15\frac{1}{5}-approximation algorithm with near optimal run time. For the case of uncertain weighted hypergraphs, we provide a Ω(1k)\Omega(\frac{1}{k})-approximation algorithm, where kk is the rank of the hypergraph (i.e., any hyperedge includes at most kk nodes), that runs in almost (modulo log factors) linear time. We complement our theoretical results by testing our approximation algorithms on a wide variety of synthetic experiments, where we observe in a controlled setting interesting findings on the trade-off between reward, and risk. We also provide an application of our formulation for providing recommendations of teams that are likely to collaborate, and have high impact.Comment: 25 page

    From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles

    Full text link
    The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.Comment: 10 pages, 8 figures, accepted at SocInfo201

    Understanding the Session Durability in Peer-to-Peer Storage System

    Full text link
    This paper emphasizes that instead of long-term availability and reliability, the short-term session durability analysis will greatly impact the design of the real large-scale Peer-to-Peer storage system. In this paper, we use a Markov chain to model the session durability, and then derive the session durability probability distribution. Subsequently, we show the difference between our analysis and the traditional Mean Time to Failure (MTTF) analysis, from which we conclude that the misuse of MTTF analysis will greatly mislead our understanding of the session durability. We further show the impact of session durability analysis on the real system design. To our best knowledge, this is the first time ever to discuss the effects of session durability in large-scale Peer-to-Peer storage system.Computer Science, Theory & MethodsSCI(E)EICPCI-S(ISTP)

    World citation and collaboration networks: uncovering the role of geography in science

    Get PDF
    Modern information and communication technologies, especially the Internet, have diminished the role of spatial distances and territorial boundaries on the access and transmissibility of information. This has enabled scientists for closer collaboration and internationalization. Nevertheless, geography remains an important factor affecting the dynamics of science. Here we present a systematic analysis of citation and collaboration networks between cities and countries, by assigning papers to the geographic locations of their authors' affiliations. The citation flows as well as the collaboration strengths between cities decrease with the distance between them and follow gravity laws. In addition, the total research impact of a country grows linearly with the amount of national funding for research & development. However, the average impact reveals a peculiar threshold effect: the scientific output of a country may reach an impact larger than the world average only if the country invests more than about 100,000 USD per researcher annually.Comment: Published version. 9 pages, 5 figures + Appendix, The world citation and collaboration networks at both city and country level are available at http://becs.aalto.fi/~rajkp/datasets.htm

    Experimental evaluation of train and test split strategies in link prediction

    Get PDF
    In link prediction, the goal is to predict which links will appear in the future of an evolving network. To estimate the performance of these models in a supervised machine learning model, disjoint and independent train and test sets are needed. However, objects in a real-world network are inherently related to each other. Therefore, it is far from trivial to separate candidate links into these disjoint sets.Here we characterize and empirically investigate the two dominant approaches from the literature for creating separate train and test sets in link prediction, referred to as random and temporal splits. Comparing the performance of these two approaches on several large temporal network datasets, we find evidence that random splits may result in too optimistic results, whereas a temporal split may give a more fair and realistic indication of performance. Results appear robust to the selection of temporal intervals. These findings will be of interest to researchers that employ link prediction or other machine learning tasks in networks.Computer Systems, Imagery and Medi

    Individualization as driving force of clustering phenomena in humans

    Get PDF
    One of the most intriguing dynamics in biological systems is the emergence of clustering, the self-organization into separated agglomerations of individuals. Several theories have been developed to explain clustering in, for instance, multi-cellular organisms, ant colonies, bee hives, flocks of birds, schools of fish, and animal herds. A persistent puzzle, however, is clustering of opinions in human populations. The puzzle is particularly pressing if opinions vary continuously, such as the degree to which citizens are in favor of or against a vaccination program. Existing opinion formation models suggest that "monoculture" is unavoidable in the long run, unless subsets of the population are perfectly separated from each other. Yet, social diversity is a robust empirical phenomenon, although perfect separation is hardly possible in an increasingly connected world. Considering randomness did not overcome the theoretical shortcomings so far. Small perturbations of individual opinions trigger social influence cascades that inevitably lead to monoculture, while larger noise disrupts opinion clusters and results in rampant individualism without any social structure. Our solution of the puzzle builds on recent empirical research, combining the integrative tendencies of social influence with the disintegrative effects of individualization. A key element of the new computational model is an adaptive kind of noise. We conduct simulation experiments to demonstrate that with this kind of noise, a third phase besides individualism and monoculture becomes possible, characterized by the formation of metastable clusters with diversity between and consensus within clusters. When clusters are small, individualization tendencies are too weak to prohibit a fusion of clusters. When clusters grow too large, however, individualization increases in strength, which promotes their splitting.Comment: 12 pages, 4 figure

    The locus of legitimate interpretation in Big Data sciences : Lessons for computational social science from -omic biology and high-energy physics

    Get PDF
    This paper argues that analyses of the ways in which Big Data has been enacted in other academic disciplines can provide us with concepts that will help understand the application of Big Data to social questions. We use examples drawn from our Science and Technology Studies (STS) analyses of -omic biology and high energy physics to demonstrate the utility of three theoretical concepts: (i) primary and secondary inscriptions, (ii) crafted and found data, and (iii) the locus of legitimate interpretation. These help us to show how the histories, organisational forms, and power dynamics of a field lead to different enactments of big data. The paper suggests that these concepts can be used to help us to understand the ways in which Big Data is being enacted in the domain of the social sciences, and to outline in general terms the ways in which this enactment might be different to that which we have observed in the ‘hard’ sciences. We contend that the locus of legitimate interpretation of Big Data biology and physics is tightly delineated, found within the disciplinary institutions and cultures of these disciplines. We suggest that when using Big Data to make knowledge claims about ‘the social’ the locus of legitimate interpretation is more diffuse, with knowledge claims that are treated as being credible made from other disciplines, or even by those outside academia entirely

    Theories for influencer identification in complex networks

    Full text link
    In social and biological systems, the structural heterogeneity of interaction networks gives rise to the emergence of a small set of influential nodes, or influencers, in a series of dynamical processes. Although much smaller than the entire network, these influencers were observed to be able to shape the collective dynamics of large populations in different contexts. As such, the successful identification of influencers should have profound implications in various real-world spreading dynamics such as viral marketing, epidemic outbreaks and cascading failure. In this chapter, we first summarize the centrality-based approach in finding single influencers in complex networks, and then discuss the more complicated problem of locating multiple influencers from a collective point of view. Progress rooted in collective influence theory, belief-propagation and computer science will be presented. Finally, we present some applications of influencer identification in diverse real-world systems, including online social platforms, scientific publication, brain networks and socioeconomic systems.Comment: 24 pages, 6 figure
    corecore