1,540 research outputs found

    CUR Decompositions, Similarity Matrices, and Subspace Clustering

    Get PDF
    A general framework for solving the subspace clustering problem using the CUR decomposition is presented. The CUR decomposition provides a natural way to construct similarity matrices for data that come from a union of unknown subspaces U=⋃Mi=1Si\mathscr{U}=\underset{i=1}{\overset{M}\bigcup}S_i. The similarity matrices thus constructed give the exact clustering in the noise-free case. Additionally, this decomposition gives rise to many distinct similarity matrices from a given set of data, which allow enough flexibility to perform accurate clustering of noisy data. We also show that two known methods for subspace clustering can be derived from the CUR decomposition. An algorithm based on the theoretical construction of similarity matrices is presented, and experiments on synthetic and real data are presented to test the method. Additionally, an adaptation of our CUR based similarity matrices is utilized to provide a heuristic algorithm for subspace clustering; this algorithm yields the best overall performance to date for clustering the Hopkins155 motion segmentation dataset.Comment: Approximately 30 pages. Current version contains improved algorithm and numerical experiments from the previous versio

    Far-Field Compression for Fast Kernel Summation Methods in High Dimensions

    Full text link
    We consider fast kernel summations in high dimensions: given a large set of points in dd dimensions (with d≫3d \gg 3) and a pair-potential function (the {\em kernel} function), we compute a weighted sum of all pairwise kernel interactions for each point in the set. Direct summation is equivalent to a (dense) matrix-vector multiplication and scales quadratically with the number of points. Fast kernel summation algorithms reduce this cost to log-linear or linear complexity. Treecodes and Fast Multipole Methods (FMMs) deliver tremendous speedups by constructing approximate representations of interactions of points that are far from each other. In algebraic terms, these representations correspond to low-rank approximations of blocks of the overall interaction matrix. Existing approaches require an excessive number of kernel evaluations with increasing dd and number of points in the dataset. To address this issue, we use a randomized algebraic approach in which we first sample the rows of a block and then construct its approximate, low-rank interpolative decomposition. We examine the feasibility of this approach theoretically and experimentally. We provide a new theoretical result showing a tighter bound on the reconstruction error from uniformly sampling rows than the existing state-of-the-art. We demonstrate that our sampling approach is competitive with existing (but prohibitively expensive) methods from the literature. We also construct kernel matrices for the Laplacian, Gaussian, and polynomial kernels -- all commonly used in physics and data analysis. We explore the numerical properties of blocks of these matrices, and show that they are amenable to our approach. Depending on the data set, our randomized algorithm can successfully compute low rank approximations in high dimensions. We report results for data sets with ambient dimensions from four to 1,000.Comment: 43 pages, 21 figure
    • …
    corecore