1,727 research outputs found
Incremental eigenpair computation for graph Laplacian matrices: theory and applications
The smallest eigenvalues and the associated eigenvectors (i.e., eigenpairs) of a graph Laplacian matrix have been widely used for spectral clustering and community detection. However, in real-life applications, the number of clusters or communities (say, K) is generally unknown a priori. Consequently, the majority of the existing methods either choose K heuristically or they repeat the clustering method with different choices of K and accept the best clustering result. The first option, more often, yields suboptimal result, while the second option is computationally expensive. In this work, we propose an incremental method for constructing the eigenspectrum of the graph Laplacian matrix. This method leverages the eigenstructure of graph Laplacian matrix to obtain the Kth smallest eigenpair of the Laplacian matrix given a collection of all previously compute
Hearing the clusters in a graph: A distributed algorithm
We propose a novel distributed algorithm to cluster graphs. The algorithm
recovers the solution obtained from spectral clustering without the need for
expensive eigenvalue/vector computations. We prove that, by propagating waves
through the graph, a local fast Fourier transform yields the local component of
every eigenvector of the Laplacian matrix, thus providing clustering
information. For large graphs, the proposed algorithm is orders of magnitude
faster than random walk based approaches. We prove the equivalence of the
proposed algorithm to spectral clustering and derive convergence rates. We
demonstrate the benefit of using this decentralized clustering algorithm for
community detection in social graphs, accelerating distributed estimation in
sensor networks and efficient computation of distributed multi-agent search
strategies
Preconditioned Spectral Clustering for Stochastic Block Partition Streaming Graph Challenge
Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is
demonstrated to efficiently solve eigenvalue problems for graph Laplacians that
appear in spectral clustering. For static graph partitioning, 10-20 iterations
of LOBPCG without preconditioning result in ~10x error reduction, enough to
achieve 100% correctness for all Challenge datasets with known truth
partitions, e.g., for graphs with 5K/.1M (50K/1M) Vertices/Edges in 2 (7)
seconds, compared to over 5,000 (30,000) seconds needed by the baseline Python
code. Our Python code 100% correctly determines 98 (160) clusters from the
Challenge static graphs with 0.5M (2M) vertices in 270 (1,700) seconds using
10GB (50GB) of memory. Our single-precision MATLAB code calculates the same
clusters at half time and memory. For streaming graph partitioning, LOBPCG is
initiated with approximate eigenvectors of the graph Laplacian already computed
for the previous graph, in many cases reducing 2-3 times the number of required
LOBPCG iterations, compared to the static case. Our spectral clustering is
generic, i.e. assuming nothing specific of the block model or streaming, used
to generate the graphs for the Challenge, in contrast to the base code.
Nevertheless, in 10-stage streaming comparison with the base code for the 5K
graph, the quality of our clusters is similar or better starting at stage 4 (7)
for emerging edging (snowballing) streaming, while the computations are over
100-1000 faster.Comment: 6 pages. To appear in Proceedings of the 2017 IEEE High Performance
Extreme Computing Conference. Student Innovation Award Streaming Graph
Challenge: Stochastic Block Partition, see
http://graphchallenge.mit.edu/champion
Attributed Network Embedding for Learning in a Dynamic Environment
Network embedding leverages the node proximity manifested to learn a
low-dimensional node vector representation for each node in the network. The
learned embeddings could advance various learning tasks such as node
classification, network clustering, and link prediction. Most, if not all, of
the existing works, are overwhelmingly performed in the context of plain and
static networks. Nonetheless, in reality, network structure often evolves over
time with addition/deletion of links and nodes. Also, a vast majority of
real-world networks are associated with a rich set of node attributes, and
their attribute values are also naturally changing, with the emerging of new
content patterns and the fading of old content patterns. These changing
characteristics motivate us to seek an effective embedding representation to
capture network and attribute evolving patterns, which is of fundamental
importance for learning in a dynamic environment. To our best knowledge, we are
the first to tackle this problem with the following two challenges: (1) the
inherently correlated network and node attributes could be noisy and
incomplete, it necessitates a robust consensus representation to capture their
individual properties and correlations; (2) the embedding learning needs to be
performed in an online fashion to adapt to the changes accordingly. In this
paper, we tackle this problem by proposing a novel dynamic attributed network
embedding framework - DANE. In particular, DANE first provides an offline
method for a consensus embedding and then leverages matrix perturbation theory
to maintain the freshness of the end embedding results in an online manner. We
perform extensive experiments on both synthetic and real attributed networks to
corroborate the effectiveness and efficiency of the proposed framework.Comment: 10 page
NFFT meets Krylov methods: Fast matrix-vector products for the graph Laplacian of fully connected networks
The graph Laplacian is a standard tool in data science, machine learning, and
image processing. The corresponding matrix inherits the complex structure of
the underlying network and is in certain applications densely populated. This
makes computations, in particular matrix-vector products, with the graph
Laplacian a hard task. A typical application is the computation of a number of
its eigenvalues and eigenvectors. Standard methods become infeasible as the
number of nodes in the graph is too large. We propose the use of the fast
summation based on the nonequispaced fast Fourier transform (NFFT) to perform
the dense matrix-vector product with the graph Laplacian fast without ever
forming the whole matrix. The enormous flexibility of the NFFT algorithm allows
us to embed the accelerated multiplication into Lanczos-based eigenvalues
routines or iterative linear system solvers and even consider other than the
standard Gaussian kernels. We illustrate the feasibility of our approach on a
number of test problems from image segmentation to semi-supervised learning
based on graph-based PDEs. In particular, we compare our approach with the
Nystr\"om method. Moreover, we present and test an enhanced, hybrid version of
the Nystr\"om method, which internally uses the NFFT.Comment: 28 pages, 9 figure
- …