14,271 research outputs found
Capacity Releasing Diffusion for Speed and Locality
Diffusions and related random walk procedures are of central importance in
many areas of machine learning, data analysis, and applied mathematics. Because
they spread mass agnostically at each step in an iterative manner, they can
sometimes spread mass "too aggressively," thereby failing to find the "right"
clusters. We introduce a novel Capacity Releasing Diffusion (CRD) Process,
which is both faster and stays more local than the classical spectral diffusion
process. As an application, we use our CRD Process to develop an improved local
algorithm for graph clustering. Our local graph clustering method can find
local clusters in a model of clustering where one begins the CRD Process in a
cluster whose vertices are connected better internally than externally by an
factor, where is the number of nodes in the cluster. Thus,
our CRD Process is the first local graph clustering algorithm that is not
subject to the well-known quadratic Cheeger barrier. Our result requires a
certain smoothness condition, which we expect to be an artifact of our
analysis. Our empirical evaluation demonstrates improved results, in particular
for realistic social graphs where there are moderately good---but not very
good---clusters.Comment: Appeared in ICML 2017. Current version added reference and discussion
of work on generalized Cheeger's inequalitie
Higher-order Spectral Clustering for Heterogeneous Graphs
Higher-order connectivity patterns such as small induced sub-graphs called
graphlets (network motifs) are vital to understand the important components
(modules/functional units) governing the configuration and behavior of complex
networks. Existing work in higher-order clustering has focused on simple
homogeneous graphs with a single node/edge type. However, heterogeneous graphs
consisting of nodes and edges of different types are seemingly ubiquitous in
the real-world. In this work, we introduce the notion of typed-graphlet that
explicitly captures the rich (typed) connectivity patterns in heterogeneous
networks. Using typed-graphlets as a basis, we develop a general principled
framework for higher-order clustering in heterogeneous networks. The framework
provides mathematical guarantees on the optimality of the higher-order
clustering obtained. The experiments demonstrate the effectiveness of the
framework quantitatively for three important applications including (i)
clustering, (ii) link prediction, and (iii) graph compression. In particular,
the approach achieves a mean improvement of 43x over all methods and graphs for
clustering while achieving a 18.7% and 20.8% improvement for link prediction
and graph compression, respectively
Heat kernel coupling for multiple graph analysis
In this paper, we introduce heat kernel coupling (HKC) as a method of
constructing multimodal spectral geometry on weighted graphs of different size
without vertex-wise bijective correspondence. We show that Laplacian averaging
can be derived as a limit case of HKC, and demonstrate its applications on
several problems from the manifold learning and pattern recognition domain
Graph reduction with spectral and cut guarantees
Can one reduce the size of a graph without significantly altering its basic
properties? The graph reduction problem is hereby approached from the
perspective of restricted spectral approximation, a modification of the
spectral similarity measure used for graph sparsification. This choice is
motivated by the observation that restricted approximation carries strong
spectral and cut guarantees, and that it implies approximation results for
unsupervised learning problems relying on spectral embeddings.
The paper then focuses on coarsening---the most common type of graph
reduction. Sufficient conditions are derived for a small graph to approximate a
larger one in the sense of restricted similarity. These findings give rise to
nearly-linear algorithms that, compared to both standard and advanced graph
reduction methods, find coarse graphs of improved quality, often by a large
margin, without sacrificing speed.Comment: 41 page
Multiclass Diffuse Interface Models for Semi-Supervised Learning on Graphs
We present a graph-based variational algorithm for multiclass classification
of high-dimensional data, motivated by total variation techniques. The energy
functional is based on a diffuse interface model with a periodic potential. We
augment the model by introducing an alternative measure of smoothness that
preserves symmetry among the class labels. Through this modification of the
standard Laplacian, we construct an efficient multiclass method that allows for
sharp transitions between classes. The experimental results demonstrate that
our approach is competitive with the state of the art among other graph-based
algorithms.Comment: 9 pages, to appear in Proceedings of the 2nd International Conference
on Pattern Recognition Applications and Methods (ICPRAM 2013
Scalable Constrained Clustering: A Generalized Spectral Method
We present a simple spectral approach to the well-studied constrained
clustering problem. It captures constrained clustering as a generalized
eigenvalue problem with graph Laplacians. The algorithm works in nearly-linear
time and provides concrete guarantees for the quality of the clusters, at least
for the case of 2-way partitioning. In practice this translates to a very fast
implementation that consistently outperforms existing spectral approaches both
in speed and quality.Comment: accepted to appear in AISTATS 2016. arXiv admin note: text overlap
with arXiv:1504.0065
Co-clustering for directed graphs: the Stochastic co-Blockmodel and spectral algorithm Di-Sim
Directed graphs have asymmetric connections, yet the current graph clustering
methodologies cannot identify the potentially global structure of these
asymmetries. We give a spectral algorithm called di-sim that builds on a dual
measure of similarity that correspond to how a node (i) sends and (ii) receives
edges. Using di-sim, we analyze the global asymmetries in the networks of Enron
emails, political blogs, and the c elegans neural connectome. In each example,
a small subset of nodes have persistent asymmetries; these nodes send edges
with one cluster, but receive edges with another cluster. Previous approaches
would have assigned these asymmetric nodes to only one cluster, failing to
identify their sending/receiving asymmetries. Regularization and "projection"
are two steps of di-sim that are essential for spectral clustering algorithms
to work in practice. The theoretical results show that these steps make the
algorithm weakly consistent under the degree corrected Stochastic
co-Blockmodel, a model that generalizes the Stochastic Blockmodel to allow for
both (i) degree heterogeneity and (ii) the global asymmetries that we intend to
detect. The theoretical results make no assumptions on the smallest degree
nodes. Instead, the theorem requires that the average degree grows sufficiently
fast and that the weak consistency only applies to the subset of the nodes with
sufficiently large leverage scores. The results results also apply to bipartite
graphs
Sampling and multilevel coarsening algorithms for fast matrix approximations
This paper addresses matrix approximation problems for matrices that are
large, sparse and/or that are representations of large graphs. To tackle these
problems, we consider algorithms that are based primarily on coarsening
techniques, possibly combined with random sampling. A multilevel coarsening
technique is proposed which utilizes a hypergraph associated with the data
matrix and a graph coarsening strategy based on column matching. Theoretical
results are established that characterize the quality of the dimension
reduction achieved by a coarsening step, when a proper column matching strategy
is employed. We consider a number of standard applications of this technique as
well as a few new ones. Among the standard applications we first consider the
problem of computing the partial SVD for which a combination of sampling and
coarsening yields significantly improved SVD results relative to sampling
alone. We also consider the Column subset selection problem, a popular low rank
approximation method used in data related applications, and show how multilevel
coarsening can be adapted for this problem. Similarly, we consider the problem
of graph sparsification and show how coarsening techniques can be employed to
solve it. Numerical experiments illustrate the performances of the methods in
various applications
A Local Approach for Identifying Clusters in Networks
Graph clustering is a fundamental problem that has been extensively studied
both in theory and practice. The problem has been defined in several ways in
literature and most of them have been proven to be NP-Hard. Due to their high
practical relevancy, several heuristics for graph clustering have been
introduced which constitute a central tool for coping with NP-completeness, and
are used in applications of clustering ranging from computer vision, to data
analysis, to learning. There exist many methodologies for this problem, however
most of them are global in nature and are unlikely to scale well for very large
networks. In this paper, we propose two scalable local approaches for
identifying the clusters in any network. We further extend one of these
approaches for discovering the overlapping clusters in these networks. Some
experimentation results obtained for the proposed approaches are also
presented
Overlapping Community Detection via Local Spectral Clustering
Large graphs arise in a number of contexts and understanding their structure
and extracting information from them is an important research area. Early
algorithms on mining communities have focused on the global structure, and
often run in time functional to the size of the entire graph. Nowadays, as we
often explore networks with billions of vertices and find communities of size
hundreds, it is crucial to shift our attention from macroscopic structure to
microscopic structure in large networks. A growing body of work has been
adopting local expansion methods in order to identify the community members
from a few exemplary seed members.
In this paper, we propose a novel approach for finding overlapping
communities called LEMON (Local Expansion via Minimum One Norm). The algorithm
finds the community by seeking a sparse vector in the span of the local spectra
such that the seeds are in its support. We show that LEMON can achieve the
highest detection accuracy among state-of-the-art proposals. The running time
depends on the size of the community rather than that of the entire graph. The
algorithm is easy to implement, and is highly parallelizable. We further
provide theoretical analysis on the local spectral properties, bounding the
measure of tightness of extracted community in terms of the eigenvalues of
graph Laplacian.
Moreover, given that networks are not all similar in nature, a comprehensive
analysis on how the local expansion approach is suited for uncovering
communities in different networks is still lacking. We thoroughly evaluate our
approach using both synthetic and real-world datasets across different domains,
and analyze the empirical variations when applying our method to inherently
different networks in practice. In addition, the heuristics on how the seed set
quality and quantity would affect the performance are provided.Comment: Extended version to the conference proceeding in WWW'1
- …