543 research outputs found

    CUR Decompositions, Similarity Matrices, and Subspace Clustering

    Get PDF
    A general framework for solving the subspace clustering problem using the CUR decomposition is presented. The CUR decomposition provides a natural way to construct similarity matrices for data that come from a union of unknown subspaces U=⋃Mi=1Si\mathscr{U}=\underset{i=1}{\overset{M}\bigcup}S_i. The similarity matrices thus constructed give the exact clustering in the noise-free case. Additionally, this decomposition gives rise to many distinct similarity matrices from a given set of data, which allow enough flexibility to perform accurate clustering of noisy data. We also show that two known methods for subspace clustering can be derived from the CUR decomposition. An algorithm based on the theoretical construction of similarity matrices is presented, and experiments on synthetic and real data are presented to test the method. Additionally, an adaptation of our CUR based similarity matrices is utilized to provide a heuristic algorithm for subspace clustering; this algorithm yields the best overall performance to date for clustering the Hopkins155 motion segmentation dataset.Comment: Approximately 30 pages. Current version contains improved algorithm and numerical experiments from the previous versio

    Generalized Separable Nonnegative Matrix Factorization

    Full text link
    Nonnegative matrix factorization (NMF) is a linear dimensionality technique for nonnegative data with applications such as image analysis, text mining, audio source separation and hyperspectral unmixing. Given a data matrix MM and a factorization rank rr, NMF looks for a nonnegative matrix WW with rr columns and a nonnegative matrix HH with rr rows such that M≈WHM \approx WH. NMF is NP-hard to solve in general. However, it can be computed efficiently under the separability assumption which requires that the basis vectors appear as data points, that is, that there exists an index set K\mathcal{K} such that W=M(:,K)W = M(:,\mathcal{K}). In this paper, we generalize the separability assumption: We only require that for each rank-one factor W(:,k)H(k,:)W(:,k)H(k,:) for k=1,2,…,rk=1,2,\dots,r, either W(:,k)=M(:,j)W(:,k) = M(:,j) for some jj or H(k,:)=M(i,:)H(k,:) = M(i,:) for some ii. We refer to the corresponding problem as generalized separable NMF (GS-NMF). We discuss some properties of GS-NMF and propose a convex optimization model which we solve using a fast gradient method. We also propose a heuristic algorithm inspired by the successive projection algorithm. To verify the effectiveness of our methods, we compare them with several state-of-the-art separable NMF algorithms on synthetic, document and image data sets.Comment: 31 pages, 12 figures, 4 tables. We have added discussions about the identifiability of the model, we have modified the first synthetic experiment, we have clarified some aspects of the contributio

    A Distance-preserving Matrix Sketch

    Full text link
    Visualizing very large matrices involves many formidable problems. Various popular solutions to these problems involve sampling, clustering, projection, or feature selection to reduce the size and complexity of the original task. An important aspect of these methods is how to preserve relative distances between points in the higher-dimensional space after reducing rows and columns to fit in a lower dimensional space. This aspect is important because conclusions based on faulty visual reasoning can be harmful. Judging dissimilar points as similar or similar points as dissimilar on the basis of a visualization can lead to false conclusions. To ameliorate this bias and to make visualizations of very large datasets feasible, we introduce two new algorithms that respectively select a subset of rows and columns of a rectangular matrix. This selection is designed to preserve relative distances as closely as possible. We compare our matrix sketch to more traditional alternatives on a variety of artificial and real datasets.Comment: 38 pages, 13 figure
    • …
    corecore