100 research outputs found
A variant of the Johnson-Lindenstrauss lemma for circulant matrices
We continue our study of the Johnson-Lindenstrauss lemma and its connection
to circulant matrices started in \cite{HV}. We reduce the bound on from
proven there to . Our
technique differs essentially from the one used in \cite{HV}. We employ the
discrete Fourier transform and singular value decomposition to deal with the
dependency caused by the circulant structure
Johnson-Lindenstrauss projection of high dimensional data
Johnson and Lindenstrauss (1984) proved that any finite set of data in a high dimensional space can be projected into a low dimensional space with the Euclidean metric information of the set being preserved within any desired accuracy. Such dimension reduction plays a critical role in many applications with massive data. There have been extensive effort in the literature on how to find explicit constructions of Johnson-Lindenstrauss projections. In this poster, we show how algebraic codes over finite fields can be used for fast Johnson-Lindenstrauss projections of data in high dimensional Euclidean spaces. This is joint work with Shuhong Gao and Yue Mao
Tighter Bounds on Johnson Lindenstrauss Transforms
Johnson and Lindenstrauss (1984) proved that any finite set of data in a high dimensional space can be projected into a low dimensional space with the Euclidean metric information of the set being preserved within any desired accuracy. Such dimension reduction plays a critical role in many applications with massive data. There has been extensive effort in the literature on how to find explicit constructions of Johnson-Lindenstrauss projections. In this poster, we show how algebraic codes over finite fields can be used for fast Johnson-Lindenstrauss projections of data in high dimensional Euclidean spaces
Improved analysis of the subsampled randomized Hadamard transform
This paper presents an improved analysis of a structured dimension-reduction
map called the subsampled randomized Hadamard transform. This argument
demonstrates that the map preserves the Euclidean geometry of an entire
subspace of vectors. The new proof is much simpler than previous approaches,
and it offers---for the first time---optimal constants in the estimate on the
number of dimensions required for the embedding.Comment: 8 pages. To appear, Advances in Adaptive Data Analysis, special issue
"Sparse Representation of Data and Images." v2--v4 include minor correction
Lazy stochastic principal component analysis
Stochastic principal component analysis (SPCA) has become a popular
dimensionality reduction strategy for large, high-dimensional datasets. We
derive a simplified algorithm, called Lazy SPCA, which has reduced
computational complexity and is better suited for large-scale distributed
computation. We prove that SPCA and Lazy SPCA find the same approximations to
the principal subspace, and that the pairwise distances between samples in the
lower-dimensional space is invariant to whether SPCA is executed lazily or not.
Empirical studies find downstream predictive performance to be identical for
both methods, and superior to random projections, across a range of predictive
models (linear regression, logistic lasso, and random forests). In our largest
experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of
computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix
multiplications, besides an operation on a small square matrix whose size
depends only on the target dimensionality.Comment: To be published in: 2017 IEEE International Conference on Data Mining
Workshops (ICDMW
Acceleration of Randomized Kaczmarz Method via the Johnson-Lindenstrauss Lemma
The Kaczmarz method is an algorithm for finding the solution to an
overdetermined consistent system of linear equations Ax=b by iteratively
projecting onto the solution spaces. The randomized version put forth by
Strohmer and Vershynin yields provably exponential convergence in expectation,
which for highly overdetermined systems even outperforms the conjugate gradient
method. In this article we present a modified version of the randomized
Kaczmarz method which at each iteration selects the optimal projection from a
randomly chosen set, which in most cases significantly improves the convergence
rate. We utilize a Johnson-Lindenstrauss dimension reduction technique to keep
the runtime on the same order as the original randomized version, adding only
extra preprocessing time. We present a series of empirical studies which
demonstrate the remarkable acceleration in convergence to the solution using
this modified approach
- …