Search CORE

543 research outputs found

CUR Decompositions, Similarity Matrices, and Subspace Clustering

Author: Aldroubi Akram
Hamm Keaton
Koku Ahmet Bugra
Sekmen Ali
Publication venue
Publication date: 11/12/2018
Field of study

A general framework for solving the subspace clustering problem using the CUR decomposition is presented. The CUR decomposition provides a natural way to construct similarity matrices for data that come from a union of unknown subspaces

\mathscr{U}=\underset{i=1}{\overset{M}\bigcup}S_i

. The similarity matrices thus constructed give the exact clustering in the noise-free case. Additionally, this decomposition gives rise to many distinct similarity matrices from a given set of data, which allow enough flexibility to perform accurate clustering of noisy data. We also show that two known methods for subspace clustering can be derived from the CUR decomposition. An algorithm based on the theoretical construction of similarity matrices is presented, and experiments on synthetic and real data are presented to test the method. Additionally, an adaptation of our CUR based similarity matrices is utilized to provide a heuristic algorithm for subspace clustering; this algorithm yields the best overall performance to date for clustering the Hopkins155 motion segmentation dataset.Comment: Approximately 30 pages. Current version contains improved algorithm and numerical experiments from the previous versio

arXiv.org e-Print Archive

Directory of Open Access Journals

Digital Scholarship @ Tennessee State University

OpenMETU (Middle East Technical University)

Generalized Separable Nonnegative Matrix Factorization

Author: Gillis Nicolas
Pan Junjun
Publication venue
Publication date: 01/01/2019
Field of study

Nonnegative matrix factorization (NMF) is a linear dimensionality technique for nonnegative data with applications such as image analysis, text mining, audio source separation and hyperspectral unmixing. Given a data matrix

M

and a factorization rank

r

, NMF looks for a nonnegative matrix

W

with

r

columns and a nonnegative matrix

H

with

r

rows such that

M \approx WH

. NMF is NP-hard to solve in general. However, it can be computed efficiently under the separability assumption which requires that the basis vectors appear as data points, that is, that there exists an index set

\mathcal{K}

such that

W = M(:,\mathcal{K})

. In this paper, we generalize the separability assumption: We only require that for each rank-one factor

W(:,k)H(k,:)

for

k=1,2,\dots,r

, either

W(:,k) = M(:,j)

for some

j

H(k,:) = M(i,:)

for some

i

. We refer to the corresponding problem as generalized separable NMF (GS-NMF). We discuss some properties of GS-NMF and propose a convex optimization model which we solve using a fast gradient method. We also propose a heuristic algorithm inspired by the successive projection algorithm. To verify the effectiveness of our methods, we compare them with several state-of-the-art separable NMF algorithms on synthetic, document and image data sets.Comment: 31 pages, 12 figures, 4 tables. We have added discussions about the identifiability of the model, we have modified the first synthetic experiment, we have clarified some aspects of the contributio

arXiv.org e-Print Archive

Crossref

A Distance-preserving Matrix Sketch

Author: Luo Hengrui
Wilkinson Leland
Publication venue
Publication date: 19/11/2021
Field of study

Visualizing very large matrices involves many formidable problems. Various popular solutions to these problems involve sampling, clustering, projection, or feature selection to reduce the size and complexity of the original task. An important aspect of these methods is how to preserve relative distances between points in the higher-dimensional space after reducing rows and columns to fit in a lower dimensional space. This aspect is important because conclusions based on faulty visual reasoning can be harmful. Judging dissimilar points as similar or similar points as dissimilar on the basis of a visualization can lead to false conclusions. To ameliorate this bias and to make visualizations of very large datasets feasible, we introduce two new algorithms that respectively select a subset of rows and columns of a rectangular matrix. This selection is designed to preserve relative distances as closely as possible. We compare our matrix sketch to more traditional alternatives on a variety of artificial and real datasets.Comment: 38 pages, 13 figure

arXiv.org e-Print Archive