426 research outputs found
Navigability is a Robust Property
The Small World phenomenon has inspired researchers across a number of
fields. A breakthrough in its understanding was made by Kleinberg who
introduced Rank Based Augmentation (RBA): add to each vertex independently an
arc to a random destination selected from a carefully crafted probability
distribution. Kleinberg proved that RBA makes many networks navigable, i.e., it
allows greedy routing to successfully deliver messages between any two vertices
in a polylogarithmic number of steps. We prove that navigability is an inherent
property of many random networks, arising without coordination, or even
independence assumptions
The evolution of interdisciplinarity in physics research
Science, being a social enterprise, is subject to fragmentation into groups
that focus on specialized areas or topics. Often new advances occur through
cross-fertilization of ideas between sub-fields that otherwise have little
overlap as they study dissimilar phenomena using different techniques. Thus to
explore the nature and dynamics of scientific progress one needs to consider
the large-scale organization and interactions between different subject areas.
Here, we study the relationships between the sub-fields of Physics using the
Physics and Astronomy Classification Scheme (PACS) codes employed for
self-categorization of articles published over the past 25 years (1985-2009).
We observe a clear trend towards increasing interactions between the different
sub-fields. The network of sub-fields also exhibits core-periphery
organization, the nucleus being dominated by Condensed Matter and General
Physics. However, over time Interdisciplinary Physics is steadily increasing
its share in the network core, reflecting a shift in the overall trend of
Physics research.Comment: Published version, 10 pages, 8 figures + Supplementary Informatio
Risk-Averse Matchings over Uncertain Graph Databases
A large number of applications such as querying sensor networks, and
analyzing protein-protein interaction (PPI) networks, rely on mining uncertain
graph and hypergraph databases. In this work we study the following problem:
given an uncertain, weighted (hyper)graph, how can we efficiently find a
(hyper)matching with high expected reward, and low risk?
This problem naturally arises in the context of several important
applications, such as online dating, kidney exchanges, and team formation. We
introduce a novel formulation for finding matchings with maximum expected
reward and bounded risk under a general model of uncertain weighted
(hyper)graphs that we introduce in this work. Our model generalizes
probabilistic models used in prior work, and captures both continuous and
discrete probability distributions, thus allowing to handle privacy related
applications that inject appropriately distributed noise to (hyper)edge
weights. Given that our optimization problem is NP-hard, we turn our attention
to designing efficient approximation algorithms. For the case of uncertain
weighted graphs, we provide a -approximation algorithm, and a
-approximation algorithm with near optimal run time. For the case
of uncertain weighted hypergraphs, we provide a
-approximation algorithm, where is the rank of the
hypergraph (i.e., any hyperedge includes at most nodes), that runs in
almost (modulo log factors) linear time.
We complement our theoretical results by testing our approximation algorithms
on a wide variety of synthetic experiments, where we observe in a controlled
setting interesting findings on the trade-off between reward, and risk. We also
provide an application of our formulation for providing recommendations of
teams that are likely to collaborate, and have high impact.Comment: 25 page
From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles
The inference of network topologies from relational data is an important
problem in data analysis. Exemplary applications include the reconstruction of
social ties from data on human interactions, the inference of gene
co-expression networks from DNA microarray data, or the learning of semantic
relationships based on co-occurrences of words in documents. Solving these
problems requires techniques to infer significant links in noisy relational
data. In this short paper, we propose a new statistical modeling framework to
address this challenge. It builds on generalized hypergeometric ensembles, a
class of generative stochastic models that give rise to analytically tractable
probability spaces of directed, multi-edge graphs. We show how this framework
can be used to assess the significance of links in noisy relational data. We
illustrate our method in two data sets capturing spatio-temporal proximity
relations between actors in a social system. The results show that our
analytical framework provides a new approach to infer significant links from
relational data, with interesting perspectives for the mining of data on social
systems.Comment: 10 pages, 8 figures, accepted at SocInfo201
Understanding the Session Durability in Peer-to-Peer Storage System
This paper emphasizes that instead of long-term availability and reliability, the short-term session durability analysis will greatly impact the design of the real large-scale Peer-to-Peer storage system. In this paper, we use a Markov chain to model the session durability, and then derive the session durability probability distribution. Subsequently, we show the difference between our analysis and the traditional Mean Time to Failure (MTTF) analysis, from which we conclude that the misuse of MTTF analysis will greatly mislead our understanding of the session durability. We further show the impact of session durability analysis on the real system design. To our best knowledge, this is the first time ever to discuss the effects of session durability in large-scale Peer-to-Peer storage system.Computer Science, Theory & MethodsSCI(E)EICPCI-S(ISTP)
World citation and collaboration networks: uncovering the role of geography in science
Modern information and communication technologies, especially the Internet,
have diminished the role of spatial distances and territorial boundaries on the
access and transmissibility of information. This has enabled scientists for
closer collaboration and internationalization. Nevertheless, geography remains
an important factor affecting the dynamics of science. Here we present a
systematic analysis of citation and collaboration networks between cities and
countries, by assigning papers to the geographic locations of their authors'
affiliations. The citation flows as well as the collaboration strengths between
cities decrease with the distance between them and follow gravity laws. In
addition, the total research impact of a country grows linearly with the amount
of national funding for research & development. However, the average impact
reveals a peculiar threshold effect: the scientific output of a country may
reach an impact larger than the world average only if the country invests more
than about 100,000 USD per researcher annually.Comment: Published version. 9 pages, 5 figures + Appendix, The world citation
and collaboration networks at both city and country level are available at
http://becs.aalto.fi/~rajkp/datasets.htm
Experimental evaluation of train and test split strategies in link prediction
In link prediction, the goal is to predict which links will appear in the future of an evolving network. To estimate the performance of these models in a supervised machine learning model, disjoint and independent train and test sets are needed. However, objects in a real-world network are inherently related to each other. Therefore, it is far from trivial to separate candidate links into these disjoint sets.Here we characterize and empirically investigate the two dominant approaches from the literature for creating separate train and test sets in link prediction, referred to as random and temporal splits. Comparing the performance of these two approaches on several large temporal network datasets, we find evidence that random splits may result in too optimistic results, whereas a temporal split may give a more fair and realistic indication of performance. Results appear robust to the selection of temporal intervals. These findings will be of interest to researchers that employ link prediction or other machine learning tasks in networks.Computer Systems, Imagery and Medi
Individualization as driving force of clustering phenomena in humans
One of the most intriguing dynamics in biological systems is the emergence of
clustering, the self-organization into separated agglomerations of individuals.
Several theories have been developed to explain clustering in, for instance,
multi-cellular organisms, ant colonies, bee hives, flocks of birds, schools of
fish, and animal herds. A persistent puzzle, however, is clustering of opinions
in human populations. The puzzle is particularly pressing if opinions vary
continuously, such as the degree to which citizens are in favor of or against a
vaccination program. Existing opinion formation models suggest that
"monoculture" is unavoidable in the long run, unless subsets of the population
are perfectly separated from each other. Yet, social diversity is a robust
empirical phenomenon, although perfect separation is hardly possible in an
increasingly connected world. Considering randomness did not overcome the
theoretical shortcomings so far. Small perturbations of individual opinions
trigger social influence cascades that inevitably lead to monoculture, while
larger noise disrupts opinion clusters and results in rampant individualism
without any social structure. Our solution of the puzzle builds on recent
empirical research, combining the integrative tendencies of social influence
with the disintegrative effects of individualization. A key element of the new
computational model is an adaptive kind of noise. We conduct simulation
experiments to demonstrate that with this kind of noise, a third phase besides
individualism and monoculture becomes possible, characterized by the formation
of metastable clusters with diversity between and consensus within clusters.
When clusters are small, individualization tendencies are too weak to prohibit
a fusion of clusters. When clusters grow too large, however, individualization
increases in strength, which promotes their splitting.Comment: 12 pages, 4 figure
The locus of legitimate interpretation in Big Data sciences : Lessons for computational social science from -omic biology and high-energy physics
This paper argues that analyses of the ways in which Big Data has been enacted in other academic disciplines can provide us with concepts that will help understand the application of Big Data to social questions. We use examples drawn from our Science and Technology Studies (STS) analyses of -omic biology and high energy physics to demonstrate the utility of three theoretical concepts: (i) primary and secondary inscriptions, (ii) crafted and found data, and (iii) the locus of legitimate interpretation. These help us to show how the histories, organisational forms, and power dynamics of a field lead to different enactments of big data. The paper suggests that these concepts can be used to help us to understand the ways in which Big Data is being enacted in the domain of the social sciences, and to outline in general terms the ways in which this enactment might be different to that which we have observed in the ‘hard’ sciences. We contend that the locus of legitimate interpretation of Big Data biology and physics is tightly delineated, found within the disciplinary institutions and cultures of these disciplines. We suggest that when using Big Data to make knowledge claims about ‘the social’ the locus of legitimate interpretation is more diffuse, with knowledge claims that are treated as being credible made from other disciplines, or even by those outside academia entirely
Theories for influencer identification in complex networks
In social and biological systems, the structural heterogeneity of interaction
networks gives rise to the emergence of a small set of influential nodes, or
influencers, in a series of dynamical processes. Although much smaller than the
entire network, these influencers were observed to be able to shape the
collective dynamics of large populations in different contexts. As such, the
successful identification of influencers should have profound implications in
various real-world spreading dynamics such as viral marketing, epidemic
outbreaks and cascading failure. In this chapter, we first summarize the
centrality-based approach in finding single influencers in complex networks,
and then discuss the more complicated problem of locating multiple influencers
from a collective point of view. Progress rooted in collective influence
theory, belief-propagation and computer science will be presented. Finally, we
present some applications of influencer identification in diverse real-world
systems, including online social platforms, scientific publication, brain
networks and socioeconomic systems.Comment: 24 pages, 6 figure
- …
