100 research outputs found

    A variant of the Johnson-Lindenstrauss lemma for circulant matrices

    Get PDF
    We continue our study of the Johnson-Lindenstrauss lemma and its connection to circulant matrices started in \cite{HV}. We reduce the bound on kk from k=O(ϵ2log3n)k=O(\epsilon^{-2}\log^3n) proven there to k=O(ϵ2log2n)k=O(\epsilon^{-2}\log^2n). Our technique differs essentially from the one used in \cite{HV}. We employ the discrete Fourier transform and singular value decomposition to deal with the dependency caused by the circulant structure

    Johnson-Lindenstrauss projection of high dimensional data

    Get PDF
    Johnson and Lindenstrauss (1984) proved that any finite set of data in a high dimensional space can be projected into a low dimensional space with the Euclidean metric information of the set being preserved within any desired accuracy. Such dimension reduction plays a critical role in many applications with massive data. There have been extensive effort in the literature on how to find explicit constructions of Johnson-Lindenstrauss projections. In this poster, we show how algebraic codes over finite fields can be used for fast Johnson-Lindenstrauss projections of data in high dimensional Euclidean spaces. This is joint work with Shuhong Gao and Yue Mao

    Tighter Bounds on Johnson Lindenstrauss Transforms

    Get PDF
    Johnson and Lindenstrauss (1984) proved that any finite set of data in a high dimensional space can be projected into a low dimensional space with the Euclidean metric information of the set being preserved within any desired accuracy. Such dimension reduction plays a critical role in many applications with massive data. There has been extensive effort in the literature on how to find explicit constructions of Johnson-Lindenstrauss projections. In this poster, we show how algebraic codes over finite fields can be used for fast Johnson-Lindenstrauss projections of data in high dimensional Euclidean spaces

    Improved analysis of the subsampled randomized Hadamard transform

    Get PDF
    This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers---for the first time---optimal constants in the estimate on the number of dimensions required for the embedding.Comment: 8 pages. To appear, Advances in Adaptive Data Analysis, special issue "Sparse Representation of Data and Images." v2--v4 include minor correction

    Lazy stochastic principal component analysis

    Full text link
    Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise distances between samples in the lower-dimensional space is invariant to whether SPCA is executed lazily or not. Empirical studies find downstream predictive performance to be identical for both methods, and superior to random projections, across a range of predictive models (linear regression, logistic lasso, and random forests). In our largest experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix multiplications, besides an operation on a small square matrix whose size depends only on the target dimensionality.Comment: To be published in: 2017 IEEE International Conference on Data Mining Workshops (ICDMW

    Acceleration of Randomized Kaczmarz Method via the Johnson-Lindenstrauss Lemma

    Get PDF
    The Kaczmarz method is an algorithm for finding the solution to an overdetermined consistent system of linear equations Ax=b by iteratively projecting onto the solution spaces. The randomized version put forth by Strohmer and Vershynin yields provably exponential convergence in expectation, which for highly overdetermined systems even outperforms the conjugate gradient method. In this article we present a modified version of the randomized Kaczmarz method which at each iteration selects the optimal projection from a randomly chosen set, which in most cases significantly improves the convergence rate. We utilize a Johnson-Lindenstrauss dimension reduction technique to keep the runtime on the same order as the original randomized version, adding only extra preprocessing time. We present a series of empirical studies which demonstrate the remarkable acceleration in convergence to the solution using this modified approach
    corecore