28,773 research outputs found
Spectral clustering and the high-dimensional stochastic blockmodel
Networks or graphs can easily represent a diverse set of data sources that
are characterized by interacting units or actors. Social networks, representing
people who communicate with each other, are one example. Communities or
clusters of highly connected actors form an essential feature in the structure
of several empirical networks. Spectral clustering is a popular and
computationally feasible method to discover these communities. The stochastic
blockmodel [Social Networks 5 (1983) 109--137] is a social network model with
well-defined communities; each node is a member of one community. For a network
generated from the Stochastic Blockmodel, we bound the number of nodes
"misclustered" by spectral clustering. The asymptotic results in this paper are
the first clustering results that allow the number of clusters in the model to
grow with the number of nodes, hence the name high-dimensional. In order to
study spectral clustering under the stochastic blockmodel, we first show that
under the more general latent space model, the eigenvectors of the normalized
graph Laplacian asymptotically converge to the eigenvectors of a "population"
normalized graph Laplacian. Aside from the implication for spectral clustering,
this provides insight into a graph visualization technique. Our method of
studying the eigenvectors of random matrices is original.Comment: Published in at http://dx.doi.org/10.1214/11-AOS887 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Covariate-assisted spectral clustering
Biological and social systems consist of myriad interacting units. The
interactions can be represented in the form of a graph or network. Measurements
of these graphs can reveal the underlying structure of these interactions,
which provides insight into the systems that generated the graphs. Moreover, in
applications such as connectomics, social networks, and genomics, graph data
are accompanied by contextualizing measures on each node. We utilize these node
covariates to help uncover latent communities in a graph, using a modification
of spectral clustering. Statistical guarantees are provided under a joint
mixture model that we call the node-contextualized stochastic blockmodel,
including a bound on the mis-clustering rate. The bound is used to derive
conditions for achieving perfect clustering. For most simulated cases,
covariate-assisted spectral clustering yields results superior to regularized
spectral clustering without node covariates and to an adaptation of canonical
correlation analysis. We apply our clustering method to large brain graphs
derived from diffusion MRI data, using the node locations or neurological
region membership as covariates. In both cases, covariate-assisted spectral
clustering yields clusters that are easier to interpret neurologically.Comment: 28 pages, 4 figures, includes substantial changes to theoretical
result
Hearing the clusters in a graph: A distributed algorithm
We propose a novel distributed algorithm to cluster graphs. The algorithm
recovers the solution obtained from spectral clustering without the need for
expensive eigenvalue/vector computations. We prove that, by propagating waves
through the graph, a local fast Fourier transform yields the local component of
every eigenvector of the Laplacian matrix, thus providing clustering
information. For large graphs, the proposed algorithm is orders of magnitude
faster than random walk based approaches. We prove the equivalence of the
proposed algorithm to spectral clustering and derive convergence rates. We
demonstrate the benefit of using this decentralized clustering algorithm for
community detection in social graphs, accelerating distributed estimation in
sensor networks and efficient computation of distributed multi-agent search
strategies
Robustness for Spectral Clustering of General Graphs under Local Differential Privacy
Spectral clustering is a widely used algorithm to find clusters in networks.
Several researchers have studied the stability of spectral clustering under
local differential privacy with the additional assumption that the underlying
networks are generated from the stochastic block model (SBM). However, we argue
that this assumption is too restrictive since social networks do not originate
from the SBM. Thus, delve into an analysis for general graphs in this work. Our
primary focus is the edge flipping method -- a common technique for protecting
local differential privacy. On a positive side, our findings suggest that even
when the edges of an -vertex graph satisfying some reasonable
well-clustering assumptions are flipped with a probability of ,
the clustering outcomes are largely consistent. Empirical tests further
corroborate these theoretical findings. Conversely, although clustering
outcomes have been stable for dense and well-clustered graphs produced from the
SBM, we show that in general, spectral clustering may yield highly erratic
results on certain dense and well-clustered graphs when the flipping
probability is . This indicates that the best privacy budget
obtainable for general graphs is
Common adversaries form alliances: modelling complex networks via anti-transitivity
Anti-transitivity captures the notion that enemies of enemies are friends,
and arises naturally in the study of adversaries in social networks and in the
study of conflicting nation states or organizations. We present a simplified,
evolutionary model for anti-transitivity influencing link formation in complex
networks, and analyze the model's network dynamics. The Iterated Local
Anti-Transitivity (or ILAT) model creates anti-clone nodes in each time-step,
and joins anti-clones to the parent node's non-neighbor set. The graphs
generated by ILAT exhibit familiar properties of complex networks such as
densification, short distances (bounded by absolute constants), and bad
spectral expansion. We determine the cop and domination number for graphs
generated by ILAT, and finish with an analysis of their clustering
coefficients. We interpret these results within the context of real-world
complex networks and present open problems
Defining and Evaluating Network Communities based on Ground-truth
Nodes in real-world networks organize into densely linked communities where
edges appear with high concentration among the members of the community.
Identifying such communities of nodes has proven to be a challenging task
mainly due to a plethora of definitions of a community, intractability of
algorithms, issues with evaluation and the lack of a reliable gold-standard
ground-truth.
In this paper we study a set of 230 large real-world social, collaboration
and information networks where nodes explicitly state their group memberships.
For example, in social networks nodes explicitly join various interest based
social groups. We use such groups to define a reliable and robust notion of
ground-truth communities. We then propose a methodology which allows us to
compare and quantitatively evaluate how different structural definitions of
network communities correspond to ground-truth communities. We choose 13
commonly used structural definitions of network communities and examine their
sensitivity, robustness and performance in identifying the ground-truth. We
show that the 13 structural definitions are heavily correlated and naturally
group into four classes. We find that two of these definitions, Conductance and
Triad-participation-ratio, consistently give the best performance in
identifying ground-truth communities. We also investigate a task of detecting
communities given a single seed node. We extend the local spectral clustering
algorithm into a heuristic parameter-free community detection method that
easily scales to networks with more than hundred million nodes. The proposed
method achieves 30% relative improvement over current local clustering methods.Comment: Proceedings of 2012 IEEE International Conference on Data Mining
(ICDM), 201
- …