11 research outputs found
Estimating the number of communities in weighted networks
Community detection in weighted networks has been a popular topic in recent
years. However, while there exist several flexible methods for estimating
communities in weighted networks, these methods usually assume that the number
of communities is known. It is usually unclear how to determine the exact
number of communities one should use. Here, to estimate the number of
communities for weighted networks generated from arbitrary distribution under
the degree-corrected distribution-free model, we propose one approach that
combines weighted modularity with spectral clustering. This approach allows a
weighted network to have negative edge weights and it also works for signed
networks. We compare the proposed method to several existing methods and show
that our method is more accurate for estimating the number of communities both
numerically and empirically
Mixed Membership Distribution-Free Model
We consider the problem of detecting latent community information of mixed
membership weighted networks in which nodes have mixed memberships and edge
weights connecting between nodes can be finite real numbers. We propose a
general mixed membership distribution-free model for this problem. The model
has no distribution constraints of adjacency matrix's elements but only the
expected values and can be viewed as generalizations of some previous models
including the famous mixed membership stochastic blockmodels. Especially,
signed networks in which nodes can belong to multiple communities can be
generated from our model. We use an efficient spectral algorithm to estimate
community memberships under the model. We derive the convergence rate of the
proposed algorithm under the model using spectral analysis. We demonstrate the
advantages of the mixed membership distribution-free model and the algorithm
with applications to a small scale of simulated networks when adjacency
matrix's elements follow different distributions. We have also applied the
algorithm to five real-world weighted network data sets with encouraging
results.Comment: 23 pages, 14 figures, 3 tabels, comments are welcom
Consistency of spectral clustering for directed network community detection
Directed networks appear in various areas, such as biology, sociology,
physiology and computer science. In this paper, we construct a spectral
clustering method based on the singular decomposition of the adjacency matrix
to detect community in directed stochastic block model (DiSBM). By considering
a sparsity parameter, under mild conditions, we show the proposed approach can
consistently recover hidden row and column communities for different scaling of
degrees. By considering the degree heterogeneity of both row and column nodes,
we further modify the proposed method and establish a theoretical framework for
directed degree corrected stochastic block model (DiDCSBM), and also show the
consistency of the modified method for this case. Our theoretical results under
DiSBM and DiDCSBM provide some innovations on some special directed networks,
such as directed network with balanced clusters, directed network with nodes
enjoying similar degrees, and the directed Erd\"os-R\'enyi graph. Furthermore,
the theoretical results under DiDCSBM are consistent with those under DiSBM.Comment: 20 pages, comments are welcom
Latent class analysis by regularized spectral clustering
The latent class model is a powerful tool for identifying latent classes
within populations that share common characteristics for categorical data in
social, psychological, and behavioral sciences. In this article, we propose two
new algorithms to estimate a latent class model for categorical data. Our
algorithms are developed by using a newly defined regularized Laplacian matrix
calculated from the response matrix. We provide theoretical convergence rates
of our algorithms by considering a sparsity parameter and show that our
algorithms stably yield consistent latent class analysis under mild conditions.
Additionally, we propose a metric to capture the strength of latent class
analysis and several procedures designed based on this metric to infer how many
latent classes one should use for real-world categorical data. The efficiency
and accuracy of our algorithms are verified by extensive simulated experiments,
and we further apply our algorithms to real-world categorical data with
promising results.Comment: 22 pages, 7 figures, 2 table
Degree-corrected distribution-free model for community detection in weighted networks
A degree-corrected distribution-free model is proposed for weighted social
networks with latent structural information. The model extends the previous
distribution-free models by considering variation in node degree to fit
real-world weighted networks, and it also extends the classical
degree-corrected stochastic block model from un-weighted network to weighted
network. We design an algorithm based on the idea of spectral clustering to fit
the model. Theoretical framework on consistent estimation for the algorithm is
developed under the model. Theoretical results when edge weights are generated
from different distributions are analyzed. We also propose a general modularity
as an extension of Newman's modularity from un-weighted network to weighted
network. Using experiments with simulated and real-world networks, we show that
our method significantly outperforms the uncorrected one, and the general
modularity is effective.Comment: 21 pages, 11 figures, 5 tables, comments are welcom
Recovery Guarantees for Graph Clustering Problems
Graph clustering is widely-studied unsupervised learning problem in which the task is to group similar entities together based on observed pairwise entity interactions. This problem has applications in diverse domains such as social network analysis and computational biology. There are multiple ways to formalize a graph clustering problem. In this thesis, using tools from convex optimization, we develop algorithms for two specific graph clustering formulations \emph{Overlapping Community Detection} and \emph{Correlation Clustering}. We study these formulations using the provable recovery paradigm which requires establishing theoretical guarantees for recovery of a certain ground truth clustering as posited by a chosen generative model.
In the Overlapping Community Detection problem, we expect clusters in the input graph to potentially overlap, i.e. share some common nodes. For this problem, often a \emph{pure nodes} assumption is made in literature which requires each cluster to have a node that belongs exclusively to that cluster. This assumption, however, may not be satisfied in practice. We propose a linear-programming-based algorithm to provably recover overlapping communities in weighted graphs without explicitly making the pure nodes assumption. We demonstrate the success of our algorithm on synthetic and real-world datasets. In the Correlation Clustering problem, we wish to determine non-overlapping clusters in the input graph without any prior knowledge of the number of clusters. We introduce a new graph generative model based on generating feature vectors/embeddings for the nodes in the graph which are interpreted as latent variables in the model, and propose a tuning-parameter-free semidefinite-programming-based algorithm to recover nodes with sufficiently strong cluster membership. We make progress towards showing that the proposed algorithm is provably robust