9,147 research outputs found
The Geometric Block Model
To capture the inherent geometric features of many community detection
problems, we propose to use a new random graph model of communities that we
call a Geometric Block Model. The geometric block model generalizes the random
geometric graphs in the same way that the well-studied stochastic block model
generalizes the Erdos-Renyi random graphs. It is also a natural extension of
random community models inspired by the recent theoretical and practical
advancement in community detection. While being a topic of fundamental
theoretical interest, our main contribution is to show that many practical
community structures are better explained by the geometric block model. We also
show that a simple triangle-counting algorithm to detect communities in the
geometric block model is near-optimal. Indeed, even in the regime where the
average degree of the graph grows only logarithmically with the number of
vertices (sparse-graph), we show that this algorithm performs extremely well,
both theoretically and practically. In contrast, the triangle-counting
algorithm is far from being optimum for the stochastic block model. We simulate
our results on both real and synthetic datasets to show superior performance of
both the new model as well as our algorithm.Comment: A shorter version of this paper has appeared in 32nd AAAI Conference
on Artificial Intelligence. The AAAI proceedings version as well as the
previous version in arxiv contained some errors that have been corrected in
this versio
Distributed Community Detection in Dynamic Graphs
Inspired by the increasing interest in self-organizing social opportunistic
networks, we investigate the problem of distributed detection of unknown
communities in dynamic random graphs. As a formal framework, we consider the
dynamic version of the well-studied \emph{Planted Bisection Model}
\sdG(n,p,q) where the node set of the network is partitioned into two
unknown communities and, at every time step, each possible edge is
active with probability if both nodes belong to the same community, while
it is active with probability (with ) otherwise. We also consider a
time-Markovian generalization of this model.
We propose a distributed protocol based on the popular \emph{Label
Propagation Algorithm} and prove that, when the ratio is larger than
(for an arbitrarily small constant ), the protocol finds the right
"planted" partition in time even when the snapshots of the dynamic
graph are sparse and disconnected (i.e. in the case ).Comment: Version I
From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles
The inference of network topologies from relational data is an important
problem in data analysis. Exemplary applications include the reconstruction of
social ties from data on human interactions, the inference of gene
co-expression networks from DNA microarray data, or the learning of semantic
relationships based on co-occurrences of words in documents. Solving these
problems requires techniques to infer significant links in noisy relational
data. In this short paper, we propose a new statistical modeling framework to
address this challenge. It builds on generalized hypergeometric ensembles, a
class of generative stochastic models that give rise to analytically tractable
probability spaces of directed, multi-edge graphs. We show how this framework
can be used to assess the significance of links in noisy relational data. We
illustrate our method in two data sets capturing spatio-temporal proximity
relations between actors in a social system. The results show that our
analytical framework provides a new approach to infer significant links from
relational data, with interesting perspectives for the mining of data on social
systems.Comment: 10 pages, 8 figures, accepted at SocInfo201
Detecting change points in the large-scale structure of evolving networks
Interactions among people or objects are often dynamic in nature and can be
represented as a sequence of networks, each providing a snapshot of the
interactions over a brief period of time. An important task in analyzing such
evolving networks is change-point detection, in which we both identify the
times at which the large-scale pattern of interactions changes fundamentally
and quantify how large and what kind of change occurred. Here, we formalize for
the first time the network change-point detection problem within an online
probabilistic learning framework and introduce a method that can reliably solve
it. This method combines a generalized hierarchical random graph model with a
Bayesian hypothesis test to quantitatively determine if, when, and precisely
how a change point has occurred. We analyze the detectability of our method
using synthetic data with known change points of different types and
magnitudes, and show that this method is more accurate than several previously
used alternatives. Applied to two high-resolution evolving social networks,
this method identifies a sequence of change points that align with known
external "shocks" to these networks
Measuring the effect of node aggregation on community detection
Many times the nodes of a complex network, whether deliberately or not, are
aggregated for technical, ethical, legal limitations or privacy reasons. A
common example is the geographic position: one may uncover communities in a
network of places, or of individuals identified with their typical geographical
position, and then aggregate these places into larger entities, such as
municipalities, thus obtaining another network. The communities found in the
networks obtained at various levels of aggregation may exhibit various degrees
of similarity, from full alignment to perfect independence. This is akin to the
problem of ecological and atomic fallacies in statistics, or to the Modified
Areal Unit Problem in geography. We identify the class of community detection
algorithms most suitable to cope with node aggregation, and develop an index
for aggregability, capturing to which extent the aggregation preserves the
community structure. We illustrate its relevance on real-world examples (mobile
phone and Twitter reply-to networks). Our main message is that any
node-partitioning analysis performed on aggregated networks should be
interpreted with caution, as the outcome may be strongly influenced by the
level of the aggregation.Comment: 12 pages, 5 figure
- …