813 research outputs found
Assortative-Constrained Stochastic Block Models
Stochastic block models (SBMs) are often used to find assortative community
structures in networks, such that the probability of connections within
communities is higher than in between communities. However, classic SBMs are
not limited to assortative structures. In this study, we discuss the
implications of this model-inherent indifference towards assortativity or
disassortativity, and show that this characteristic can lead to undesirable
outcomes for networks which are presupposedy assortative but which contain a
reduced amount of information. To circumvent this issue, we introduce a
constrained SBM that imposes strong assortativity constraints, along with
efficient algorithmic approaches to solve it. These constraints significantly
boost community recovery capabilities in regimes that are close to the
information-theoretic threshold. They also permit to identify
structurally-different communities in networks representing cerebral-cortex
activity regions
Spectral Clustering of Graphs with the Bethe Hessian
Spectral clustering is a standard approach to label nodes on a graph by
studying the (largest or lowest) eigenvalues of a symmetric real matrix such as
e.g. the adjacency or the Laplacian. Recently, it has been argued that using
instead a more complicated, non-symmetric and higher dimensional operator,
related to the non-backtracking walk on the graph, leads to improved
performance in detecting clusters, and even to optimal performance for the
stochastic block model. Here, we propose to use instead a simpler object, a
symmetric real matrix known as the Bethe Hessian operator, or deformed
Laplacian. We show that this approach combines the performances of the
non-backtracking operator, thus detecting clusters all the way down to the
theoretical limit in the stochastic block model, with the computational,
theoretical and memory advantages of real symmetric matrices.Comment: 8 pages, 2 figure
Partial recovery bounds for clustering with the relaxed means
We investigate the clustering performances of the relaxed means in the
setting of sub-Gaussian Mixture Model (sGMM) and Stochastic Block Model (SBM).
After identifying the appropriate signal-to-noise ratio (SNR), we prove that
the misclassification error decay exponentially fast with respect to this SNR.
These partial recovery bounds for the relaxed means improve upon results
currently known in the sGMM setting. In the SBM setting, applying the relaxed
means SDP allows to handle general connection probabilities whereas other
SDPs investigated in the literature are restricted to the assortative case
(where within group probabilities are larger than between group probabilities).
Again, this partial recovery bound complements the state-of-the-art results.
All together, these results put forward the versatility of the relaxed
means.Comment: 39 page
On the relationship between Gaussian stochastic blockmodels and label propagation algorithms
The problem of community detection receives great attention in recent years.
Many methods have been proposed to discover communities in networks. In this
paper, we propose a Gaussian stochastic blockmodel that uses Gaussian
distributions to fit weight of edges in networks for non-overlapping community
detection. The maximum likelihood estimation of this model has the same
objective function as general label propagation with node preference. The node
preference of a specific vertex turns out to be a value proportional to the
intra-community eigenvector centrality (the corresponding entry in principal
eigenvector of the adjacency matrix of the subgraph inside that vertex's
community) under maximum likelihood estimation. Additionally, the maximum
likelihood estimation of a constrained version of our model is highly related
to another extension of label propagation algorithm, namely, the label
propagation algorithm under constraint. Experiments show that the proposed
Gaussian stochastic blockmodel performs well on various benchmark networks.Comment: 22 pages, 17 figure
Model selection and hypothesis testing for large-scale network models with overlapping groups
The effort to understand network systems in increasing detail has resulted in
a diversity of methods designed to extract their large-scale structure from
data. Unfortunately, many of these methods yield diverging descriptions of the
same network, making both the comparison and understanding of their results a
difficult challenge. A possible solution to this outstanding issue is to shift
the focus away from ad hoc methods and move towards more principled approaches
based on statistical inference of generative models. As a result, we face
instead the more well-defined task of selecting between competing generative
processes, which can be done under a unified probabilistic framework. Here, we
consider the comparison between a variety of generative models including
features such as degree correction, where nodes with arbitrary degrees can
belong to the same group, and community overlap, where nodes are allowed to
belong to more than one group. Because such model variants possess an
increasing number of parameters, they become prone to overfitting. In this
work, we present a method of model selection based on the minimum description
length criterion and posterior odds ratios that is capable of fully accounting
for the increased degrees of freedom of the larger models, and selects the best
one according to the statistical evidence available in the data. In applying
this method to many empirical unweighted networks from different fields, we
observe that community overlap is very often not supported by statistical
evidence and is selected as a better model only for a minority of them. On the
other hand, we find that degree correction tends to be almost universally
favored by the available data, implying that intrinsic node proprieties (as
opposed to group properties) are often an essential ingredient of network
formation.Comment: 20 pages,7 figures, 1 tabl
Active Discovery of Network Roles for Predicting the Classes of Network Nodes
Nodes in real world networks often have class labels, or underlying
attributes, that are related to the way in which they connect to other nodes.
Sometimes this relationship is simple, for instance nodes of the same class are
may be more likely to be connected. In other cases, however, this is not true,
and the way that nodes link in a network exhibits a different, more complex
relationship to their attributes. Here, we consider networks in which we know
how the nodes are connected, but we do not know the class labels of the nodes
or how class labels relate to the network links. We wish to identify the best
subset of nodes to label in order to learn this relationship between node
attributes and network links. We can then use this discovered relationship to
accurately predict the class labels of the rest of the network nodes.
We present a model that identifies groups of nodes with similar link
patterns, which we call network roles, using a generative blockmodel. The model
then predicts labels by learning the mapping from network roles to class labels
using a maximum margin classifier. We choose a subset of nodes to label
according to an iterative margin-based active learning strategy. By integrating
the discovery of network roles with the classifier optimisation, the active
learning process can adapt the network roles to better represent the network
for node classification. We demonstrate the model by exploring a selection of
real world networks, including a marine food web and a network of English
words. We show that, in contrast to other network classifiers, this model
achieves good classification accuracy for a range of networks with different
relationships between class labels and network links
- …