8,333 research outputs found
Unbiased sampling of network ensembles
Sampling random graphs with given properties is a key step in the analysis of
networks, as random ensembles represent basic null models required to identify
patterns such as communities and motifs. An important requirement is that the
sampling process is unbiased and efficient. The main approaches are
microcanonical, i.e. they sample graphs that match the enforced constraints
exactly. Unfortunately, when applied to strongly heterogeneous networks (like
most real-world examples), the majority of these approaches become biased
and/or time-consuming. Moreover, the algorithms defined in the simplest cases,
such as binary graphs with given degrees, are not easily generalizable to more
complicated ensembles. Here we propose a solution to the problem via the
introduction of a "Maximize and Sample" ("Max & Sam" for short) method to
correctly sample ensembles of networks where the constraints are `soft', i.e.
realized as ensemble averages. Our method is based on exact maximum-entropy
distributions and is therefore unbiased by construction, even for strongly
heterogeneous networks. It is also more computationally efficient than most
microcanonical alternatives. Finally, it works for both binary and weighted
networks with a variety of constraints, including combined degree-strength
sequences and full reciprocity structure, for which no alternative method
exists. Our canonical approach can in principle be turned into an unbiased
microcanonical one, via a restriction to the relevant subset. Importantly, the
analysis of the fluctuations of the constraints suggests that the
microcanonical and canonical versions of all the ensembles considered here are
not equivalent. We show various real-world applications and provide a code
implementing all our algorithms.Comment: MatLab code available at
http://www.mathworks.it/matlabcentral/fileexchange/46912-max-sam-package-zi
From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles
The inference of network topologies from relational data is an important
problem in data analysis. Exemplary applications include the reconstruction of
social ties from data on human interactions, the inference of gene
co-expression networks from DNA microarray data, or the learning of semantic
relationships based on co-occurrences of words in documents. Solving these
problems requires techniques to infer significant links in noisy relational
data. In this short paper, we propose a new statistical modeling framework to
address this challenge. It builds on generalized hypergeometric ensembles, a
class of generative stochastic models that give rise to analytically tractable
probability spaces of directed, multi-edge graphs. We show how this framework
can be used to assess the significance of links in noisy relational data. We
illustrate our method in two data sets capturing spatio-temporal proximity
relations between actors in a social system. The results show that our
analytical framework provides a new approach to infer significant links from
relational data, with interesting perspectives for the mining of data on social
systems.Comment: 10 pages, 8 figures, accepted at SocInfo201
Efficient and exact sampling of simple graphs with given arbitrary degree sequence
Uniform sampling from graphical realizations of a given degree sequence is a
fundamental component in simulation-based measurements of network observables,
with applications ranging from epidemics, through social networks to Internet
modeling. Existing graph sampling methods are either link-swap based
(Markov-Chain Monte Carlo algorithms) or stub-matching based (the Configuration
Model). Both types are ill-controlled, with typically unknown mixing times for
link-swap methods and uncontrolled rejections for the Configuration Model. Here
we propose an efficient, polynomial time algorithm that generates statistically
independent graph samples with a given, arbitrary, degree sequence. The
algorithm provides a weight associated with each sample, allowing the
observable to be measured either uniformly over the graph ensemble, or,
alternatively, with a desired distribution. Unlike other algorithms, this
method always produces a sample, without back-tracking or rejections. Using a
central limit theorem-based reasoning, we argue, that for large N, and for
degree sequences admitting many realizations, the sample weights are expected
to have a lognormal distribution. As examples, we apply our algorithm to
generate networks with degree sequences drawn from power-law distributions and
from binomial distributions.Comment: 8 pages, 3 figure
Sampling motif-constrained ensembles of networks
The statistical significance of network properties is conditioned on null
models which satisfy spec- ified properties but that are otherwise random.
Exponential random graph models are a principled theoretical framework to
generate such constrained ensembles, but which often fail in practice, either
due to model inconsistency, or due to the impossibility to sample networks from
them. These problems affect the important case of networks with prescribed
clustering coefficient or number of small connected subgraphs (motifs). In this
paper we use the Wang-Landau method to obtain a multicanonical sampling that
overcomes both these problems. We sample, in polynomial time, net- works with
arbitrary degree sequences from ensembles with imposed motifs counts. Applying
this method to social networks, we investigate the relation between
transitivity and homophily, and we quantify the correlation between different
types of motifs, finding that single motifs can explain up to 60% of the
variation of motif profiles.Comment: Updated version, as published in the journal. 7 pages, 5 figures, one
Supplemental Materia
- …