11 research outputs found

    Estimating the number of communities in weighted networks

    Full text link
    Community detection in weighted networks has been a popular topic in recent years. However, while there exist several flexible methods for estimating communities in weighted networks, these methods usually assume that the number of communities is known. It is usually unclear how to determine the exact number of communities one should use. Here, to estimate the number of communities for weighted networks generated from arbitrary distribution under the degree-corrected distribution-free model, we propose one approach that combines weighted modularity with spectral clustering. This approach allows a weighted network to have negative edge weights and it also works for signed networks. We compare the proposed method to several existing methods and show that our method is more accurate for estimating the number of communities both numerically and empirically

    Mixed Membership Distribution-Free Model

    Full text link
    We consider the problem of detecting latent community information of mixed membership weighted networks in which nodes have mixed memberships and edge weights connecting between nodes can be finite real numbers. We propose a general mixed membership distribution-free model for this problem. The model has no distribution constraints of adjacency matrix's elements but only the expected values and can be viewed as generalizations of some previous models including the famous mixed membership stochastic blockmodels. Especially, signed networks in which nodes can belong to multiple communities can be generated from our model. We use an efficient spectral algorithm to estimate community memberships under the model. We derive the convergence rate of the proposed algorithm under the model using spectral analysis. We demonstrate the advantages of the mixed membership distribution-free model and the algorithm with applications to a small scale of simulated networks when adjacency matrix's elements follow different distributions. We have also applied the algorithm to five real-world weighted network data sets with encouraging results.Comment: 23 pages, 14 figures, 3 tabels, comments are welcom

    Consistency of spectral clustering for directed network community detection

    Full text link
    Directed networks appear in various areas, such as biology, sociology, physiology and computer science. In this paper, we construct a spectral clustering method based on the singular decomposition of the adjacency matrix to detect community in directed stochastic block model (DiSBM). By considering a sparsity parameter, under mild conditions, we show the proposed approach can consistently recover hidden row and column communities for different scaling of degrees. By considering the degree heterogeneity of both row and column nodes, we further modify the proposed method and establish a theoretical framework for directed degree corrected stochastic block model (DiDCSBM), and also show the consistency of the modified method for this case. Our theoretical results under DiSBM and DiDCSBM provide some innovations on some special directed networks, such as directed network with balanced clusters, directed network with nodes enjoying similar degrees, and the directed Erd\"os-R\'enyi graph. Furthermore, the theoretical results under DiDCSBM are consistent with those under DiSBM.Comment: 20 pages, comments are welcom

    Latent class analysis by regularized spectral clustering

    Full text link
    The latent class model is a powerful tool for identifying latent classes within populations that share common characteristics for categorical data in social, psychological, and behavioral sciences. In this article, we propose two new algorithms to estimate a latent class model for categorical data. Our algorithms are developed by using a newly defined regularized Laplacian matrix calculated from the response matrix. We provide theoretical convergence rates of our algorithms by considering a sparsity parameter and show that our algorithms stably yield consistent latent class analysis under mild conditions. Additionally, we propose a metric to capture the strength of latent class analysis and several procedures designed based on this metric to infer how many latent classes one should use for real-world categorical data. The efficiency and accuracy of our algorithms are verified by extensive simulated experiments, and we further apply our algorithms to real-world categorical data with promising results.Comment: 22 pages, 7 figures, 2 table

    Degree-corrected distribution-free model for community detection in weighted networks

    Full text link
    A degree-corrected distribution-free model is proposed for weighted social networks with latent structural information. The model extends the previous distribution-free models by considering variation in node degree to fit real-world weighted networks, and it also extends the classical degree-corrected stochastic block model from un-weighted network to weighted network. We design an algorithm based on the idea of spectral clustering to fit the model. Theoretical framework on consistent estimation for the algorithm is developed under the model. Theoretical results when edge weights are generated from different distributions are analyzed. We also propose a general modularity as an extension of Newman's modularity from un-weighted network to weighted network. Using experiments with simulated and real-world networks, we show that our method significantly outperforms the uncorrected one, and the general modularity is effective.Comment: 21 pages, 11 figures, 5 tables, comments are welcom

    Recovery Guarantees for Graph Clustering Problems

    Get PDF
    Graph clustering is widely-studied unsupervised learning problem in which the task is to group similar entities together based on observed pairwise entity interactions. This problem has applications in diverse domains such as social network analysis and computational biology. There are multiple ways to formalize a graph clustering problem. In this thesis, using tools from convex optimization, we develop algorithms for two specific graph clustering formulations \emph{Overlapping Community Detection} and \emph{Correlation Clustering}. We study these formulations using the provable recovery paradigm which requires establishing theoretical guarantees for recovery of a certain ground truth clustering as posited by a chosen generative model. In the Overlapping Community Detection problem, we expect clusters in the input graph to potentially overlap, i.e. share some common nodes. For this problem, often a \emph{pure nodes} assumption is made in literature which requires each cluster to have a node that belongs exclusively to that cluster. This assumption, however, may not be satisfied in practice. We propose a linear-programming-based algorithm to provably recover overlapping communities in weighted graphs without explicitly making the pure nodes assumption. We demonstrate the success of our algorithm on synthetic and real-world datasets. In the Correlation Clustering problem, we wish to determine non-overlapping clusters in the input graph without any prior knowledge of the number of clusters. We introduce a new graph generative model based on generating feature vectors/embeddings for the nodes in the graph which are interpreted as latent variables in the model, and propose a tuning-parameter-free semidefinite-programming-based algorithm to recover nodes with sufficiently strong cluster membership. We make progress towards showing that the proposed algorithm is provably robust
    corecore