3,817 research outputs found
Optimal approximate matrix product in terms of stable rank
We prove, using the subspace embedding guarantee in a black box way, that one
can achieve the spectral norm guarantee for approximate matrix multiplication
with a dimensionality-reducing map having
rows. Here is the maximum stable rank, i.e. squared ratio of
Frobenius and operator norms, of the two matrices being multiplied. This is a
quantitative improvement over previous work of [MZ11, KVZ14], and is also
optimal for any oblivious dimensionality-reducing map. Furthermore, due to the
black box reliance on the subspace embedding property in our proofs, our
theorem can be applied to a much more general class of sketching matrices than
what was known before, in addition to achieving better bounds. For example, one
can apply our theorem to efficient subspace embeddings such as the Subsampled
Randomized Hadamard Transform or sparse subspace embeddings, or even with
subspace embedding constructions that may be developed in the future.
Our main theorem, via connections with spectral error matrix multiplication
shown in prior work, implies quantitative improvements for approximate least
squares regression and low rank approximation. Our main result has also already
been applied to improve dimensionality reduction guarantees for -means
clustering [CEMMP14], and implies new results for nonparametric regression
[YPW15].
We also separately point out that the proof of the "BSS" deterministic
row-sampling result of [BSS12] can be modified to show that for any matrices
of stable rank at most , one can achieve the spectral norm
guarantee for approximate matrix multiplication of by deterministically
sampling rows that can be found in polynomial
time. The original result of [BSS12] was for rank instead of stable rank. Our
observation leads to a stronger version of a main theorem of [KMST10].Comment: v3: minor edits; v2: fixed one step in proof of Theorem 9 which was
wrong by a constant factor (see the new Lemma 5 and its use; final theorem
unaffected
Manifold Elastic Net: A Unified Framework for Sparse Dimension Reduction
It is difficult to find the optimal sparse solution of a manifold learning
based dimensionality reduction algorithm. The lasso or the elastic net
penalized manifold learning based dimensionality reduction is not directly a
lasso penalized least square problem and thus the least angle regression (LARS)
(Efron et al. \cite{LARS}), one of the most popular algorithms in sparse
learning, cannot be applied. Therefore, most current approaches take indirect
ways or have strict settings, which can be inconvenient for applications. In
this paper, we proposed the manifold elastic net or MEN for short. MEN
incorporates the merits of both the manifold learning based dimensionality
reduction and the sparse learning based dimensionality reduction. By using a
series of equivalent transformations, we show MEN is equivalent to the lasso
penalized least square problem and thus LARS is adopted to obtain the optimal
sparse solution of MEN. In particular, MEN has the following advantages for
subsequent classification: 1) the local geometry of samples is well preserved
for low dimensional data representation, 2) both the margin maximization and
the classification error minimization are considered for sparse projection
calculation, 3) the projection matrix of MEN improves the parsimony in
computation, 4) the elastic net penalty reduces the over-fitting problem, and
5) the projection matrix of MEN can be interpreted psychologically and
physiologically. Experimental evidence on face recognition over various popular
datasets suggests that MEN is superior to top level dimensionality reduction
algorithms.Comment: 33 pages, 12 figure
Categorical Dimensions of Human Odor Descriptor Space Revealed by Non-Negative Matrix Factorization
In contrast to most other sensory modalities, the basic perceptual dimensions of olfaction remain unclear. Here, we use non-negative matrix factorization (NMF) – a dimensionality reduction technique – to uncover structure in a panel of odor profiles, with each odor defined as a point in multi-dimensional descriptor space. The properties of NMF are favorable for the analysis of such lexical and perceptual data, and lead to a high-dimensional account of odor space. We further provide evidence that odor dimensions apply categorically. That is, odor space is not occupied homogenously, but rather in a discrete and intrinsically clustered manner. We discuss the potential implications of these results for the neural coding of odors, as well as for developing classifiers on larger datasets that may be useful for predicting perceptual qualities from chemical structures
Dimensionality Reduction for k-Means Clustering and Low Rank Approximation
We show how to approximate a data matrix with a much smaller
sketch that can be used to solve a general class of
constrained k-rank approximation problems to within error.
Importantly, this class of problems includes -means clustering and
unconstrained low rank approximation (i.e. principal component analysis). By
reducing data points to just dimensions, our methods generically
accelerate any exact, approximate, or heuristic algorithm for these ubiquitous
problems.
For -means dimensionality reduction, we provide relative
error results for many common sketching techniques, including random row
projection, column selection, and approximate SVD. For approximate principal
component analysis, we give a simple alternative to known algorithms that has
applications in the streaming setting. Additionally, we extend recent work on
column-based matrix reconstruction, giving column subsets that not only `cover'
a good subspace for \bv{A}, but can be used directly to compute this
subspace.
Finally, for -means clustering, we show how to achieve a
approximation by Johnson-Lindenstrauss projecting data points to just dimensions. This gives the first result that leverages the
specific structure of -means to achieve dimension independent of input size
and sublinear in
Simultaneous Spectral-Spatial Feature Selection and Extraction for Hyperspectral Images
In hyperspectral remote sensing data mining, it is important to take into
account of both spectral and spatial information, such as the spectral
signature, texture feature and morphological property, to improve the
performances, e.g., the image classification accuracy. In a feature
representation point of view, a nature approach to handle this situation is to
concatenate the spectral and spatial features into a single but high
dimensional vector and then apply a certain dimension reduction technique
directly on that concatenated vector before feed it into the subsequent
classifier. However, multiple features from various domains definitely have
different physical meanings and statistical properties, and thus such
concatenation hasn't efficiently explore the complementary properties among
different features, which should benefit for boost the feature
discriminability. Furthermore, it is also difficult to interpret the
transformed results of the concatenated vector. Consequently, finding a
physically meaningful consensus low dimensional feature representation of
original multiple features is still a challenging task. In order to address the
these issues, we propose a novel feature learning framework, i.e., the
simultaneous spectral-spatial feature selection and extraction algorithm, for
hyperspectral images spectral-spatial feature representation and
classification. Specifically, the proposed method learns a latent low
dimensional subspace by projecting the spectral-spatial feature into a common
feature space, where the complementary information has been effectively
exploited, and simultaneously, only the most significant original features have
been transformed. Encouraging experimental results on three public available
hyperspectral remote sensing datasets confirm that our proposed method is
effective and efficient
- …