2,506 research outputs found
A Tensor Approach to Learning Mixed Membership Community Models
Community detection is the task of detecting hidden communities from observed
interactions. Guaranteed community detection has so far been mostly limited to
models with non-overlapping communities such as the stochastic block model. In
this paper, we remove this restriction, and provide guaranteed community
detection for a family of probabilistic network models with overlapping
communities, termed as the mixed membership Dirichlet model, first introduced
by Airoldi et al. This model allows for nodes to have fractional memberships in
multiple communities and assumes that the community memberships are drawn from
a Dirichlet distribution. Moreover, it contains the stochastic block model as a
special case. We propose a unified approach to learning these models via a
tensor spectral decomposition method. Our estimator is based on low-order
moment tensor of the observed network, consisting of 3-star counts. Our
learning method is fast and is based on simple linear algebraic operations,
e.g. singular value decomposition and tensor power iterations. We provide
guaranteed recovery of community memberships and model parameters and present a
careful finite sample analysis of our learning method. As an important special
case, our results match the best known scaling requirements for the
(homogeneous) stochastic block model
Online Tensor Methods for Learning Latent Variable Models
We introduce an online tensor decomposition based approach for two latent
variable modeling problems namely, (1) community detection, in which we learn
the latent communities that the social actors in social networks belong to, and
(2) topic modeling, in which we infer hidden topics of text articles. We
consider decomposition of moment tensors using stochastic gradient descent. We
conduct optimization of multilinear operations in SGD and avoid directly
forming the tensors, to save computational and storage costs. We present
optimized algorithm in two platforms. Our GPU-based implementation exploits the
parallelism of SIMD architectures to allow for maximum speed-up by a careful
optimization of storage and data transfer, whereas our CPU-based implementation
uses efficient sparse matrix computations and is suitable for large sparse
datasets. For the community detection problem, we demonstrate accuracy and
computational efficiency on Facebook, Yelp and DBLP datasets, and for the topic
modeling problem, we also demonstrate good performance on the New York Times
dataset. We compare our results to the state-of-the-art algorithms such as the
variational method, and report a gain of accuracy and a gain of several orders
of magnitude in the execution time.Comment: JMLR 201
Consistent Estimation of Mixed Memberships with Successive Projections
This paper considers the parameter estimation problem in Mixed Membership
Stochastic Block Model (MMSB), which is a quite general instance of random
graph model allowing for overlapping community structure. We present the new
algorithm successive projection overlapping clustering (SPOC) which combines
the ideas of spectral clustering and geometric approach for separable
non-negative matrix factorization. The proposed algorithm is provably
consistent under MMSB with general conditions on the parameters of the model.
SPOC is also shown to perform well experimentally in comparison to other
algorithms
Consistency of adjacency spectral embedding for the mixed membership stochastic blockmodel
The mixed membership stochastic blockmodel is a statistical model for a
graph, which extends the stochastic blockmodel by allowing every node to
randomly choose a different community each time a decision of whether to form
an edge is made. Whereas spectral analysis for the stochastic blockmodel is
increasingly well established, theory for the mixed membership case is
considerably less developed. Here we show that adjacency spectral embedding
into , followed by fitting the minimum volume enclosing convex
-polytope to the principal components, leads to a consistent estimate
of a -community mixed membership stochastic blockmodel. The key is to
identify a direct correspondence between the mixed membership stochastic
blockmodel and the random dot product graph, which greatly facilitates
theoretical analysis. Specifically, a norm and central
limit theorem for the random dot product graph are exploited to respectively
show consistency and partially correct the bias of the procedure.Comment: 12 pages, 6 figure
A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks
This paper presents a novel spectral algorithm with additive clustering
designed to identify overlapping communities in networks. The algorithm is
based on geometric properties of the spectrum of the expected adjacency matrix
in a random graph model that we call stochastic blockmodel with overlap (SBMO).
An adaptive version of the algorithm, that does not require the knowledge of
the number of hidden communities, is proved to be consistent under the SBMO
when the degrees in the graph are (slightly more than) logarithmic. The
algorithm is shown to perform well on simulated data and on real-world graphs
with known overlapping communities.Comment: Journal of Theoretical Computer Science (TCS), Elsevier, A Para\^itr
Learning Mixed Membership Community Models in Social Tagging Networks through Tensor Methods
Community detection in graphs has been extensively studied both in theory and in applications. However, detecting communities in hypergraphs is more challenging. In this paper, we propose a tensor decomposition approach for guaranteed learning of communities in a special class of hypergraphs modeling social tagging systems or folksonomies. A folksonomy is a tripartite 3-uniform hypergraph consisting of (user, tag, resource) hyperedges. We posit a probabilistic mixed membership community model, and prove that the tensor method consistently learns the communities under efficient sample complexity and separation requirements
- …