15,236 research outputs found
Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels
Variational methods for parameter estimation are an active research area,
potentially offering computationally tractable heuristics with theoretical
performance bounds. We build on recent work that applies such methods to
network data, and establish asymptotic normality rates for parameter estimates
of stochastic blockmodel data, by either maximum likelihood or variational
estimation. The result also applies to various sub-models of the stochastic
blockmodel found in the literature.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1124 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization
We study the problem of detecting a structured, low-rank signal matrix
corrupted with additive Gaussian noise. This includes clustering in a Gaussian
mixture model, sparse PCA, and submatrix localization. Each of these problems
is conjectured to exhibit a sharp information-theoretic threshold, below which
the signal is too weak for any algorithm to detect. We derive upper and lower
bounds on these thresholds by applying the first and second moment methods to
the likelihood ratio between these "planted models" and null models where the
signal matrix is zero. Our bounds differ by at most a factor of root two when
the rank is large (in the clustering and submatrix localization problems, when
the number of clusters or blocks is large) or the signal matrix is very sparse.
Moreover, our upper bounds show that for each of these problems there is a
significant regime where reliable detection is information- theoretically
possible but where known algorithms such as PCA fail completely, since the
spectrum of the observed matrix is uninformative. This regime is analogous to
the conjectured 'hard but detectable' regime for community detection in sparse
graphs.Comment: For sparse PCA and submatrix localization, we determine the
information-theoretic threshold exactly in the limit where the number of
blocks is large or the signal matrix is very sparse based on a conditional
second moment method, closing the factor of root two gap in the first versio
Optimization via Low-rank Approximation for Community Detection in Networks
Community detection is one of the fundamental problems of network analysis,
for which a number of methods have been proposed. Most model-based or
criteria-based methods have to solve an optimization problem over a discrete
set of labels to find communities, which is computationally infeasible. Some
fast spectral algorithms have been proposed for specific methods or models, but
only on a case-by-case basis. Here we propose a general approach for maximizing
a function of a network adjacency matrix over discrete labels by projecting the
set of labels onto a subspace approximating the leading eigenvectors of the
expected adjacency matrix. This projection onto a low-dimensional space makes
the feasible set of labels much smaller and the optimization problem much
easier. We prove a general result about this method and show how to apply it to
several previously proposed community detection criteria, establishing its
consistency for label estimation in each case and demonstrating the fundamental
connection between spectral properties of the network and various model-based
approaches to community detection. Simulations and applications to real-world
data are included to demonstrate our method performs well for multiple problems
over a wide range of parameters.Comment: 45 pages, 7 figures; added discussions about computational complexity
and extension to more than two communitie
- …