12,072 research outputs found
Subspace clustering of dimensionality-reduced data
Subspace clustering refers to the problem of clustering unlabeled
high-dimensional data points into a union of low-dimensional linear subspaces,
assumed unknown. In practice one may have access to dimensionality-reduced
observations of the data only, resulting, e.g., from "undersampling" due to
complexity and speed constraints on the acquisition device. More pertinently,
even if one has access to the high-dimensional data set it is often desirable
to first project the data points into a lower-dimensional space and to perform
the clustering task there; this reduces storage requirements and computational
cost. The purpose of this paper is to quantify the impact of
dimensionality-reduction through random projection on the performance of the
sparse subspace clustering (SSC) and the thresholding based subspace clustering
(TSC) algorithms. We find that for both algorithms dimensionality reduction
down to the order of the subspace dimensions is possible without incurring
significant performance degradation. The mathematical engine behind our
theorems is a result quantifying how the affinities between subspaces change
under random dimensionality reducing projections.Comment: ISIT 201
A Convex Formulation for Spectral Shrunk Clustering
Spectral clustering is a fundamental technique in the field of data mining
and information processing. Most existing spectral clustering algorithms
integrate dimensionality reduction into the clustering process assisted by
manifold learning in the original space. However, the manifold in
reduced-dimensional subspace is likely to exhibit altered properties in
contrast with the original space. Thus, applying manifold information obtained
from the original space to the clustering process in a low-dimensional subspace
is prone to inferior performance. Aiming to address this issue, we propose a
novel convex algorithm that mines the manifold structure in the low-dimensional
subspace. In addition, our unified learning process makes the manifold learning
particularly tailored for the clustering. Compared with other related methods,
the proposed algorithm results in more structured clustering result. To
validate the efficacy of the proposed algorithm, we perform extensive
experiments on several benchmark datasets in comparison with some
state-of-the-art clustering approaches. The experimental results demonstrate
that the proposed algorithm has quite promising clustering performance.Comment: AAAI201
EDSC: Efficient document subspace clustering technique for high-dimensional data
With the advancement in the pervasive technology, there is a spontaneous rise in the size of the data. Such data are generated from various forms of resources right from individual to organization level. Due to the characteristics of unstructured or semi-structuredness in data representation, the existing data analytics approaches are not directly applicable which leads to curse of dimensionality problem. Hence, this paper presents an Efficient Document Subspace Clustering (EDSC) technique for high-dimensional data that contributes to the existing system with respect to identification by eliminating the redundant data. The discrete segmentation of data points are used to explicitly expose the dimensionality of hidden subspaces in the clusters. The outcome of the proposed system was compared with existing system to find the effective document clustering process for high-dimensional data. The processing time of EDSC for subspace clustering is reduced by 50% as compared to the existing system
A convex formulation for spectral shrunk clustering
Copyright © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Spectral clustering is a fundamental technique in the field of data mining and information processing. Most existing spectral clustering algorithms integrate dimensionality reduction into the clustering process assisted by manifold learning in the original space. However, the manifold in reduced-dimensional subspace is likely to exhibit altered properties in contrast with the original space. Thus, applying manifold information obtained from the original space to the clustering process in a low-dimensional subspace is prone to inferior performance. Aiming to address this issue, we propose a novel convex algorithm that mines the manifold structure in the low-dimensional subspace. In addition, our unified learning process makes the manifold learning particularly tailored for the clustering. Compared with other related methods, the proposed algorithm results in more structured clustering result. To validate the efficacy of the proposed algorithm, we perform extensive experiments on several benchmark datasets in comparison with some state-of-the-art clustering approaches. The experimental results demonstrate that the proposed algorithm has quite promising clustering performance
Categorical Dimensions of Human Odor Descriptor Space Revealed by Non-Negative Matrix Factorization
In contrast to most other sensory modalities, the basic perceptual dimensions of olfaction remain unclear. Here, we use non-negative matrix factorization (NMF) – a dimensionality reduction technique – to uncover structure in a panel of odor profiles, with each odor defined as a point in multi-dimensional descriptor space. The properties of NMF are favorable for the analysis of such lexical and perceptual data, and lead to a high-dimensional account of odor space. We further provide evidence that odor dimensions apply categorically. That is, odor space is not occupied homogenously, but rather in a discrete and intrinsically clustered manner. We discuss the potential implications of these results for the neural coding of odors, as well as for developing classifiers on larger datasets that may be useful for predicting perceptual qualities from chemical structures
- …