504 research outputs found

    Spectral Thresholds in the Bipartite Stochastic Block Model

    Get PDF
    We consider a bipartite stochastic block model on vertex sets V1V_1 and V2V_2, with planted partitions in each, and ask at what densities efficient algorithms can recover the partition of the smaller vertex set. When ∣V2βˆ£β‰«βˆ£V1∣|V_2| \gg |V_1|, multiple thresholds emerge. We first locate a sharp threshold for detection of the partition, in the sense of the results of \cite{mossel2012stochastic,mossel2013proof} and \cite{massoulie2014community} for the stochastic block model. We then show that at a higher edge density, the singular vectors of the rectangular biadjacency matrix exhibit a localization / delocalization phase transition, giving recovery above the threshold and no recovery below. Nevertheless, we propose a simple spectral algorithm, Diagonal Deletion SVD, which recovers the partition at a nearly optimal edge density. The bipartite stochastic block model studied here was used by \cite{feldman2014algorithm} to give a unified algorithm for recovering planted partitions and assignments in random hypergraphs and random kk-SAT formulae respectively. Our results give the best known bounds for the clause density at which solutions can be found efficiently in these models as well as showing a barrier to further improvement via this reduction to the bipartite block model.Comment: updated version, will appear in COLT 201

    Subsampled Power Iteration: a Unified Algorithm for Block Models and Planted CSP's

    Get PDF
    We present an algorithm for recovering planted solutions in two well-known models, the stochastic block model and planted constraint satisfaction problems, via a common generalization in terms of random bipartite graphs. Our algorithm matches up to a constant factor the best-known bounds for the number of edges (or constraints) needed for perfect recovery and its running time is linear in the number of edges used. The time complexity is significantly better than both spectral and SDP-based approaches. The main contribution of the algorithm is in the case of unequal sizes in the bipartition (corresponding to odd uniformity in the CSP). Here our algorithm succeeds at a significantly lower density than the spectral approaches, surpassing a barrier based on the spectral norm of a random matrix. Other significant features of the algorithm and analysis include (i) the critical use of power iteration with subsampling, which might be of independent interest; its analysis requires keeping track of multiple norms of an evolving solution (ii) it can be implemented statistically, i.e., with very limited access to the input distribution (iii) the algorithm is extremely simple to implement and runs in linear time, and thus is practical even for very large instances

    Clustering Partially Observed Graphs via Convex Optimization

    Get PDF
    This paper considers the problem of clustering a partially observed unweighted graph---i.e., one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organize the nodes into disjoint clusters so that there is relatively dense (observed) connectivity within clusters, and sparse across clusters. We take a novel yet natural approach to this problem, by focusing on finding the clustering that minimizes the number of "disagreements"---i.e., the sum of the number of (observed) missing edges within clusters, and (observed) present edges across clusters. Our algorithm uses convex optimization; its basis is a reduction of disagreement minimization to the problem of recovering an (unknown) low-rank matrix and an (unknown) sparse matrix from their partially observed sum. We evaluate the performance of our algorithm on the classical Planted Partition/Stochastic Block Model. Our main theorem provides sufficient conditions for the success of our algorithm as a function of the minimum cluster size, edge density and observation probability; in particular, the results characterize the tradeoff between the observation probability and the edge density gap. When there are a constant number of clusters of equal size, our results are optimal up to logarithmic factors.Comment: This is the final version published in Journal of Machine Learning Research (JMLR). Partial results appeared in International Conference on Machine Learning (ICML) 201

    Spectral partitioning of time-varying networks with unobserved edges

    Full text link
    We discuss a variant of `blind' community detection, in which we aim to partition an unobserved network from the observation of a (dynamical) graph signal defined on the network. We consider a scenario where our observed graph signals are obtained by filtering white noise input, and the underlying network is different for every observation. In this fashion, the filtered graph signals can be interpreted as defined on a time-varying network. We model each of the underlying network realizations as generated by an independent draw from a latent stochastic blockmodel (SBM). To infer the partition of the latent SBM, we propose a simple spectral algorithm for which we provide a theoretical analysis and establish consistency guarantees for the recovery. We illustrate our results using numerical experiments on synthetic and real data, highlighting the efficacy of our approach.Comment: 5 pages, 2 figure
    • …
    corecore