119 research outputs found

    Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform

    Full text link
    The problems of random projections and sparse reconstruction have much in common and individually received much attention. Surprisingly, until now they progressed in parallel and remained mostly separate. Here, we employ new tools from probability in Banach spaces that were successfully used in the context of sparse reconstruction to advance on an open problem in random pojection. In particular, we generalize and use an intricate result by Rudelson and Vershynin for sparse reconstruction which uses Dudley's theorem for bounding Gaussian processes. Our main result states that any set of N=exp(O~(n))N = \exp(\tilde{O}(n)) real vectors in nn dimensional space can be linearly mapped to a space of dimension k=O(\log N\polylog(n)), while (1) preserving the pairwise distances among the vectors to within any constant distortion and (2) being able to apply the transformation in time O(nlogn)O(n\log n) on each vector. This improves on the best known N=exp(O~(n1/2))N = \exp(\tilde{O}(n^{1/2})) achieved by Ailon and Liberty and N=exp(O~(n1/3))N = \exp(\tilde{O}(n^{1/3})) by Ailon and Chazelle. The dependence in the distortion constant however is believed to be suboptimal and subject to further investigation. For constant distortion, this settles the open question posed by these authors up to a \polylog(n) factor while considerably simplifying their constructions

    Acceleration of Randomized Kaczmarz Method via the Johnson-Lindenstrauss Lemma

    Get PDF
    The Kaczmarz method is an algorithm for finding the solution to an overdetermined consistent system of linear equations Ax=b by iteratively projecting onto the solution spaces. The randomized version put forth by Strohmer and Vershynin yields provably exponential convergence in expectation, which for highly overdetermined systems even outperforms the conjugate gradient method. In this article we present a modified version of the randomized Kaczmarz method which at each iteration selects the optimal projection from a randomly chosen set, which in most cases significantly improves the convergence rate. We utilize a Johnson-Lindenstrauss dimension reduction technique to keep the runtime on the same order as the original randomized version, adding only extra preprocessing time. We present a series of empirical studies which demonstrate the remarkable acceleration in convergence to the solution using this modified approach

    Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

    Get PDF
    We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques

    Sparser Johnson-Lindenstrauss Transforms

    Get PDF
    We give two different and simple constructions for dimensionality reduction in 2\ell_2 via linear mappings that are sparse: only an O(ε)O(\varepsilon)-fraction of entries in each column of our embedding matrices are non-zero to achieve distortion 1+ε1+\varepsilon with high probability, while still achieving the asymptotically optimal number of rows. These are the first constructions to provide subconstant sparsity for all values of parameters, improving upon previous works of Achlioptas (JCSS 2003) and Dasgupta, Kumar, and Sarl\'{o}s (STOC 2010). Such distributions can be used to speed up applications where 2\ell_2 dimensionality reduction is used.Comment: v6: journal version, minor changes, added Remark 23; v5: modified abstract, fixed typos, added open problem section; v4: simplified section 4 by giving 1 analysis that covers both constructions; v3: proof of Theorem 25 in v2 was written incorrectly, now fixed; v2: Added another construction achieving same upper bound, and added proof of near-tight lower bound for DKS schem

    On Deterministic Sketching and Streaming for Sparse Recovery and Norm Estimation

    Full text link
    We study classic streaming and sparse recovery problems using deterministic linear sketches, including l1/l1 and linf/l1 sparse recovery problems (the latter also being known as l1-heavy hitters), norm estimation, and approximate inner product. We focus on devising a fixed matrix A in R^{m x n} and a deterministic recovery/estimation procedure which work for all possible input vectors simultaneously. Our results improve upon existing work, the following being our main contributions: * A proof that linf/l1 sparse recovery and inner product estimation are equivalent, and that incoherent matrices can be used to solve both problems. Our upper bound for the number of measurements is m=O(eps^{-2}*min{log n, (log n / log(1/eps))^2}). We can also obtain fast sketching and recovery algorithms by making use of the Fast Johnson-Lindenstrauss transform. Both our running times and number of measurements improve upon previous work. We can also obtain better error guarantees than previous work in terms of a smaller tail of the input vector. * A new lower bound for the number of linear measurements required to solve l1/l1 sparse recovery. We show Omega(k/eps^2 + klog(n/k)/eps) measurements are required to recover an x' with |x - x'|_1 <= (1+eps)|x_{tail(k)}|_1, where x_{tail(k)} is x projected onto all but its largest k coordinates in magnitude. * A tight bound of m = Theta(eps^{-2}log(eps^2 n)) on the number of measurements required to solve deterministic norm estimation, i.e., to recover |x|_2 +/- eps|x|_1. For all the problems we study, tight bounds are already known for the randomized complexity from previous work, except in the case of l1/l1 sparse recovery, where a nearly tight bound is known. Our work thus aims to study the deterministic complexities of these problems

    Improved analysis of the subsampled randomized Hadamard transform

    Get PDF
    This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers---for the first time---optimal constants in the estimate on the number of dimensions required for the embedding.Comment: 8 pages. To appear, Advances in Adaptive Data Analysis, special issue "Sparse Representation of Data and Images." v2--v4 include minor correction
    corecore