653 research outputs found
NFFT meets Krylov methods: Fast matrix-vector products for the graph Laplacian of fully connected networks
The graph Laplacian is a standard tool in data science, machine learning, and
image processing. The corresponding matrix inherits the complex structure of
the underlying network and is in certain applications densely populated. This
makes computations, in particular matrix-vector products, with the graph
Laplacian a hard task. A typical application is the computation of a number of
its eigenvalues and eigenvectors. Standard methods become infeasible as the
number of nodes in the graph is too large. We propose the use of the fast
summation based on the nonequispaced fast Fourier transform (NFFT) to perform
the dense matrix-vector product with the graph Laplacian fast without ever
forming the whole matrix. The enormous flexibility of the NFFT algorithm allows
us to embed the accelerated multiplication into Lanczos-based eigenvalues
routines or iterative linear system solvers and even consider other than the
standard Gaussian kernels. We illustrate the feasibility of our approach on a
number of test problems from image segmentation to semi-supervised learning
based on graph-based PDEs. In particular, we compare our approach with the
Nystr\"om method. Moreover, we present and test an enhanced, hybrid version of
the Nystr\"om method, which internally uses the NFFT.Comment: 28 pages, 9 figure
Spectral Clustering: An Empirical Study of Approximation Algorithms and its Application to the Attrition Problem
Clustering is the problem of separating a set of objects into groups (called clusters) so that objects within the same cluster are more similar to each other than to those in different clusters. Spectral clustering is a now well-known method for clustering which utilizes the spectrum of the data similarity matrix to perform this separation. Since the method relies on solving an eigenvector problem, it is computationally expensive for large datasets. To overcome this constraint, approximation methods have been developed which aim to reduce running time while maintaining accurate classification. In this article, we summarize and experimentally evaluate several approximation methods for spectral clustering. From an applications standpoint, we employ spectral clustering to solve the so-called attrition problem, where one aims to identify from a set of employees those who are likely to voluntarily leave the company from those who are not. Our study sheds light on the empirical performance of existing approximate spectral clustering methods and shows the applicability of these methods in an important business optimization related problem
Fixed-Rank Approximation of a Positive-Semidefinite Matrix from Streaming Data
Several important applications, such as streaming PCA and semidefinite
programming, involve a large-scale positive-semidefinite (psd) matrix that is
presented as a sequence of linear updates. Because of storage limitations, it
may only be possible to retain a sketch of the psd matrix. This paper develops
a new algorithm for fixed-rank psd approximation from a sketch. The approach
combines the Nystrom approximation with a novel mechanism for rank truncation.
Theoretical analysis establishes that the proposed method can achieve any
prescribed relative error in the Schatten 1-norm and that it exploits the
spectral decay of the input matrix. Computer experiments show that the proposed
method dominates alternative techniques for fixed-rank psd matrix approximation
across a wide range of examples
Towards Scalable Spectral Clustering via Spectrum-Preserving Sparsification
Eigenvalue decomposition of Laplacian matrices for large nearest-neighbor (NN)graphs is the major computational bottleneck in spectral clustering (SC). To fundamentally address this computational challenge in SC, we propose a scalable spectral sparsification framework that enables to construct nearly-linear-sized ultra-sparse NN graphs with guaranteed preservation of key eigenvalues and eigenvectors of the original Laplacian. The proposed method is based on the latest theoretical results in spectral graph theory and thus can be applied to robustly handle general undirected graphs. By leveraging a nearly-linear time spectral graph topology sparsification phase and a subgraph scaling phase via stochastic gradient descent (SGD) iterations, our approach allows computing tree-like NN graphs that can serve as high-quality proxies of the original NN graphs, leading to highly-scalable and accurate SC of large data sets. Our extensive experimental results on a variety of public domain data sets show dramatically improved performance when compared with state-of-the-art SC methods
- …