Search CORE

17,296 research outputs found

Oracle Based Active Set Algorithm for Scalable Elastic Net Subspace Clustering

Author: Li Chun-Guang
Robinson Daniel P.
Vidal Rene
You Chong
Publication venue
Publication date: 09/05/2016
Field of study

State-of-the-art subspace clustering methods are based on expressing each data point as a linear combination of other data points while regularizing the matrix of coefficients with

\ell_1

\ell_2

or nuclear norms.

\ell_1

regularization is guaranteed to give a subspace-preserving affinity (i.e., there are no connections between points from different subspaces) under broad theoretical conditions, but the clusters may not be connected.

\ell_2

and nuclear norm regularization often improve connectivity, but give a subspace-preserving affinity only for independent subspaces. Mixed

\ell_1

\ell_2

and nuclear norm regularizations offer a balance between the subspace-preserving and connectedness properties, but this comes at the cost of increased computational complexity. This paper studies the geometry of the elastic net regularizer (a mixture of the

\ell_1

and

\ell_2

norms) and uses it to derive a provably correct and scalable active set method for finding the optimal coefficients. Our geometric analysis also provides a theoretical justification and a geometric interpretation for the balance between the connectedness (due to

\ell_2

regularization) and subspace-preserving (due to

\ell_1

regularization) properties for elastic net subspace clustering. Our experiments show that the proposed active set method not only achieves state-of-the-art clustering performance, but also efficiently handles large-scale datasets.Comment: 15 pages, 6 figures, accepted to CVPR 2016 for oral presentatio

arXiv.org e-Print Archive

Crossref

Functional Factorial K-means Analysis

Author: Terada Yoshikazu
Yamamoto Michio
Publication venue
Publication date: 08/02/2014
Field of study

A new procedure for simultaneously finding the optimal cluster structure of multivariate functional objects and finding the subspace to represent the cluster structure is presented. The method is based on the

k

-means criterion for projected functional objects on a subspace in which a cluster structure exists. An efficient alternating least-squares algorithm is described, and the proposed method is extended to a regularized method for smoothness of weight functions. To deal with the negative effect of the correlation of coefficient matrix of the basis function expansion in the proposed algorithm, a two-step approach to the proposed method is also described. Analyses of artificial and real data demonstrate that the proposed method gives correct and interpretable results compared with existing methods, the functional principal component

k

-means (FPCK) method and tandem clustering approach. It is also shown that the proposed method can be considered complementary to FPCK.Comment: 39 pages, 17 figure

arXiv.org e-Print Archive

Kyoto University Research Information Repository

On Generalization Bounds for Projective Clustering

Author: Bucarelli Maria Sofia
Larsen Matilde Fjeldsø
Schwiegelshohn Chris
Toftrup Mads Bech
Publication venue
Publication date: 13/10/2023
Field of study

Given a set of points, clustering consists of finding a partition of a point set into

k

clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the famous

k

-median and

k

-means objectives. One may also choose centers to be

j

dimensional subspaces, which gives rise to subspace clustering. In this paper, we consider learning bounds for these problems. That is, given a set of

n

samples

P

drawn independently from some unknown, but fixed distribution

\mathcal{D}

, how quickly does a solution computed on

P

converge to the optimal clustering of

\mathcal{D}

? We give several near optimal results. In particular, For center-based objectives, we show a convergence rate of

\tilde{O}\left(\sqrt{{k}/{n}}\right)

. This matches the known optimal bounds of [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] and [Bartlett, Linder, and Lugosi, IEEE Trans. Inf. Theory 1998] for

k

-means and extends it to other important objectives such as

k

-median. For subspace clustering with

j

-dimensional subspaces, we show a convergence rate of

\tilde{O}\left(\sqrt{\frac{kj^2}{n}}\right)

. These are the first provable bounds for most of these problems. For the specific case of projective clustering, which generalizes

k

-means, we show a convergence rate of

\Omega\left(\sqrt{\frac{kj}{n}}\right)

is necessary, thereby proving that the bounds from [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] are essentially optimal

arXiv.org e-Print Archive

Innovation Pursuit: A New Approach to Subspace Clustering

Author: Atia George
Rahmani Mostafa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/11/2017
Field of study

In subspace clustering, a group of data points belonging to a union of subspaces are assigned membership to their respective subspaces. This paper presents a new approach dubbed Innovation Pursuit (iPursuit) to the problem of subspace clustering using a new geometrical idea whereby subspaces are identified based on their relative novelties. We present two frameworks in which the idea of innovation pursuit is used to distinguish the subspaces. Underlying the first framework is an iterative method that finds the subspaces consecutively by solving a series of simple linear optimization problems, each searching for a direction of innovation in the span of the data potentially orthogonal to all subspaces except for the one to be identified in one step of the algorithm. A detailed mathematical analysis is provided establishing sufficient conditions for iPursuit to correctly cluster the data. The proposed approach can provably yield exact clustering even when the subspaces have significant intersections. It is shown that the complexity of the iterative approach scales only linearly in the number of data points and subspaces, and quadratically in the dimension of the subspaces. The second framework integrates iPursuit with spectral clustering to yield a new variant of spectral-clustering-based algorithms. The numerical simulations with both real and synthetic data demonstrate that iPursuit can often outperform the state-of-the-art subspace clustering algorithms, more so for subspaces with significant intersections, and that it significantly improves the state-of-the-art result for subspace-segmentation-based face clustering

arXiv.org e-Print Archive

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

A Dimension Reduction Scheme for the Computation of Optimal Unions of Subspaces

Author: Aldroubi Akram
Anastasio Magalí
Cabrelli Carlos
Molter Ursula
Publication venue
Publication date: 10/01/2011
Field of study

Given a set of points \F in a high dimensional space, the problem of finding a union of subspaces \cup_i V_i\subset \R^N that best explains the data \F increases dramatically with the dimension of \R^N. In this article, we study a class of transformations that map the problem into another one in lower dimension. We use the best model in the low dimensional space to approximate the best solution in the original high dimensional space. We then estimate the error produced between this solution and the optimal solution in the high dimensional space.Comment: 15 pages. Some corrections were added, in particular the title was changed. It will appear in "Sampling Theory in Signal and Image Processing

arXiv.org e-Print Archive

CONICET Digital

Sparse Subspace Clustering: Algorithm, Theory, and Applications

Author: Elhamifar Ehsan
Vidal Rene
Publication venue
Publication date: 01/01/2013
Field of study

In many real-world problems, we are dealing with collections of high-dimensional data, such as images, videos, text and web documents, DNA microarray data, and more. Often, high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories the data belongs to. In this paper, we propose and study an algorithm, called Sparse Subspace Clustering (SSC), to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of subspaces and the distribution of data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm can be solved efficiently and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal with data nuisances, such as noise, sparse outlying entries, and missing entries, directly by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering

arXiv.org e-Print Archive

CiteSeerX

CUR Decompositions, Similarity Matrices, and Subspace Clustering

Author: Aldroubi Akram
Hamm Keaton
Koku Ahmet Bugra
Sekmen Ali
Publication venue
Publication date: 11/12/2018
Field of study

A general framework for solving the subspace clustering problem using the CUR decomposition is presented. The CUR decomposition provides a natural way to construct similarity matrices for data that come from a union of unknown subspaces

\mathscr{U}=\underset{i=1}{\overset{M}\bigcup}S_i

. The similarity matrices thus constructed give the exact clustering in the noise-free case. Additionally, this decomposition gives rise to many distinct similarity matrices from a given set of data, which allow enough flexibility to perform accurate clustering of noisy data. We also show that two known methods for subspace clustering can be derived from the CUR decomposition. An algorithm based on the theoretical construction of similarity matrices is presented, and experiments on synthetic and real data are presented to test the method. Additionally, an adaptation of our CUR based similarity matrices is utilized to provide a heuristic algorithm for subspace clustering; this algorithm yields the best overall performance to date for clustering the Hopkins155 motion segmentation dataset.Comment: Approximately 30 pages. Current version contains improved algorithm and numerical experiments from the previous versio

arXiv.org e-Print Archive

Directory of Open Access Journals

Digital Scholarship @ Tennessee State University

OpenMETU (Middle East Technical University)

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

Author: Cohen Michael B.
Elder Sam
Musco Cameron
Musco Christopher
Persu Madalina
Publication venue
Publication date: 02/04/2015
Field of study

We show how to approximate a data matrix

\mathbf{A}

with a much smaller sketch

\mathbf{\tilde A}

that can be used to solve a general class of constrained k-rank approximation problems to within

(1+\epsilon)

error. Importantly, this class of problems includes

k

-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just

O(k)

dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For

k

-means dimensionality reduction, we provide

(1+\epsilon)

relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for \bv{A}, but can be used directly to compute this subspace. Finally, for

k

-means clustering, we show how to achieve a

(9+\epsilon)

approximation by Johnson-Lindenstrauss projecting data points to just

O(\log k/\epsilon^2)

dimensions. This gives the first result that leverages the specific structure of

k

-means to achieve dimension independent of input size and sublinear in

k

arXiv.org e-Print Archive

CiteSeerX