11 research outputs found

    On Deterministic Sketching and Streaming for Sparse Recovery and Norm Estimation

    Full text link
    We study classic streaming and sparse recovery problems using deterministic linear sketches, including l1/l1 and linf/l1 sparse recovery problems (the latter also being known as l1-heavy hitters), norm estimation, and approximate inner product. We focus on devising a fixed matrix A in R^{m x n} and a deterministic recovery/estimation procedure which work for all possible input vectors simultaneously. Our results improve upon existing work, the following being our main contributions: * A proof that linf/l1 sparse recovery and inner product estimation are equivalent, and that incoherent matrices can be used to solve both problems. Our upper bound for the number of measurements is m=O(eps^{-2}*min{log n, (log n / log(1/eps))^2}). We can also obtain fast sketching and recovery algorithms by making use of the Fast Johnson-Lindenstrauss transform. Both our running times and number of measurements improve upon previous work. We can also obtain better error guarantees than previous work in terms of a smaller tail of the input vector. * A new lower bound for the number of linear measurements required to solve l1/l1 sparse recovery. We show Omega(k/eps^2 + klog(n/k)/eps) measurements are required to recover an x' with |x - x'|_1 <= (1+eps)|x_{tail(k)}|_1, where x_{tail(k)} is x projected onto all but its largest k coordinates in magnitude. * A tight bound of m = Theta(eps^{-2}log(eps^2 n)) on the number of measurements required to solve deterministic norm estimation, i.e., to recover |x|_2 +/- eps|x|_1. For all the problems we study, tight bounds are already known for the randomized complexity from previous work, except in the case of l1/l1 sparse recovery, where a nearly tight bound is known. Our work thus aims to study the deterministic complexities of these problems

    On Deterministic Sketching and Streaming for Sparse Recovery and Norm Estimation

    Get PDF
    We study classic streaming and sparse recovery problems using deterministic linear sketches, including 1/1\ell_1/\ell_1 and /1\ell_{\infty}/\ell_1 sparse recovery problems (the latter also being known as ℓ1ℓ1-heavy hitters), norm estimation, and approximate inner product. We focus on devising a fixed matrix AϵRm×nA \epsilon \mathbb{R}^{m \times n} and a deterministic recovery/estimation procedure which work for all possible input vectors simultaneously. Our results improve upon existing work, the following being our main contributions: • A proof that /1\ell_{\infty}/\ell_1 sparse recovery and inner product estimation are equivalent, and that incoherent matrices can be used to solve both problems. Our upper bound for the number of measurements is m=O(ε2min{logn,(logn/log(1/ε))2})m=O(\varepsilon^{-2}min\{log n,(log n/log(1/\varepsilon))^2\}). We can also obtain fast sketching and recovery algorithms by making use of the Fast Johnson–Lindenstrauss transform. Both our running times and number of measurements improve upon previous work. We can also obtain better error guarantees than previous work in terms of a smaller tail of the input vector. • A new lower bound for the number of linear measurements required to solve 1/1\ell_1/\ell_1 sparse recovery. We show Ω(k/ε2+klog(n/k)/ε)\Omega(k/\varepsilon^2+k log(n/k)/\varepsilon) measurements are required to recover an x′ with xx1(1+ε)xtail(k)1‖x-x′‖_1\leq(1+\varepsilon)‖x_{tail(k)}‖_1, where xtail(k)x_{tail(k)} is x projected onto all but its largest k coordinates in magnitude. • A tight bound of m=θ(ε2log(ε2n))m=\theta(\varepsilon^{-2}log(\varepsilon^2n)) on the number of measurements required to solve deterministic norm estimation, i.e., to recover x2±εx1‖x‖_2\pm\varepsilon‖x‖_1. For all the problems we study, tight bounds are already known for the randomized complexity from previous work, except in the case of 1/1\ell_1/\ell_1 sparse recovery, where a nearly tight bound is known. Our work thus aims to study the deterministic complexities of these problems. We remark that some of the matrices used in our algorithms, although known to exist, currently are not yet explicit in the sense that deterministic polynomial time constructions are not yet known, although in all cases polynomial time Monte Carlo algorithms are known.Engineering and Applied Science

    Optimality of the Johnson-Lindenstrauss Lemma

    Full text link
    For any integers d,n2d, n \geq 2 and 1/(min{n,d})0.4999<ε<11/({\min\{n,d\}})^{0.4999} < \varepsilon<1, we show the existence of a set of nn vectors XRdX\subset \mathbb{R}^d such that any embedding f:XRmf:X\rightarrow \mathbb{R}^m satisfying x,yX, (1ε)xy22f(x)f(y)22(1+ε)xy22 \forall x,y\in X,\ (1-\varepsilon)\|x-y\|_2^2\le \|f(x)-f(y)\|_2^2 \le (1+\varepsilon)\|x-y\|_2^2 must have m=Ω(ε2lgn). m = \Omega(\varepsilon^{-2} \lg n). This lower bound matches the upper bound given by the Johnson-Lindenstrauss lemma [JL84]. Furthermore, our lower bound holds for nearly the full range of ε\varepsilon of interest, since there is always an isometric embedding into dimension min{d,n}\min\{d, n\} (either the identity map, or projection onto span(X)\mathop{span}(X)). Previously such a lower bound was only known to hold against linear maps ff, and not for such a wide range of parameters ε,n,d\varepsilon, n, d [LN16]. The best previously known lower bound for general ff was m=Ω(ε2lgn/lg(1/ε))m = \Omega(\varepsilon^{-2}\lg n/\lg(1/\varepsilon)) [Wel74, Lev83, Alo03], which is suboptimal for any ε=o(1)\varepsilon = o(1).Comment: v2: simplified proof, also added reference to Lev8

    Time lower bounds for nonadaptive turnstile streaming algorithms

    Full text link
    We say a turnstile streaming algorithm is "non-adaptive" if, during updates, the memory cells written and read depend only on the index being updated and random coins tossed at the beginning of the stream (and not on the memory contents of the algorithm). Memory cells read during queries may be decided upon adaptively. All known turnstile streaming algorithms in the literature are non-adaptive. We prove the first non-trivial update time lower bounds for both randomized and deterministic turnstile streaming algorithms, which hold when the algorithms are non-adaptive. While there has been abundant success in proving space lower bounds, there have been no non-trivial update time lower bounds in the turnstile model. Our lower bounds hold against classically studied problems such as heavy hitters, point query, entropy estimation, and moment estimation. In some cases of deterministic algorithms, our lower bounds nearly match known upper bounds

    Tight Bounds for Set Disjointness in the Message Passing Model

    Full text link
    In a multiparty message-passing model of communication, there are kk players. Each player has a private input, and they communicate by sending messages to one another over private channels. While this model has been used extensively in distributed computing and in multiparty computation, lower bounds on communication complexity in this model and related models have been somewhat scarce. In recent work \cite{phillips12,woodruff12,woodruff13}, strong lower bounds of the form Ω(nk)\Omega(n \cdot k) were obtained for several functions in the message-passing model; however, a lower bound on the classical Set Disjointness problem remained elusive. In this paper, we prove tight lower bounds of the form Ω(nk)\Omega(n \cdot k) for the Set Disjointness problem in the message passing model. Our bounds are obtained by developing information complexity tools in the message-passing model, and then proving an information complexity lower bound for Set Disjointness. As a corollary, we show a tight lower bound for the task allocation problem \cite{DruckerKuhnOshman} via a reduction from Set Disjointness

    For-all Sparse Recovery in Near-optimal Time

    No full text
    An approximate sparse recovery system in 1\ell_1 norm consists of parameters kk, ϵ\epsilon, NN, an mm-by-NN measurement Φ\Phi, and a recovery algorithm, R\mathcal{R}. Given a vector, x\mathbf{x}, the system approximates xx by x^=R(Φx)\widehat{\mathbf{x}} = \mathcal{R}(\Phi\mathbf{x}), which must satisfy x^x1(1+ϵ)xxk1\|\widehat{\mathbf{x}}-\mathbf{x}\|_1 \leq (1+\epsilon)\|\mathbf{x}-\mathbf{x}_k\|_1. We consider the 'for all' model, in which a single matrix Φ\Phi, possibly 'constructed' non-explicitly using the probabilistic method, is used for all signals x\mathbf{x}. The best existing sublinear algorithm by Porat and Strauss (SODA'12) uses O(ϵ3klog(N/k))O(\epsilon^{-3} k\log(N/k)) measurements and runs in time O(k1αNα)O(k^{1-\alpha}N^\alpha) for any constant α>0\alpha > 0. In this paper, we improve the number of measurements to O(ϵ2klog(N/k))O(\epsilon^{-2} k \log(N/k)), matching the best existing upper bound (attained by super-linear algorithms), and the runtime to O(k1+βpoly(logN,1/ϵ))O(k^{1+\beta}\textrm{poly}(\log N,1/\epsilon)), with a modest restriction that ϵ(logk/logN)γ\epsilon \leq (\log k/\log N)^{\gamma}, for any constants β,γ>0\beta,\gamma > 0. When klogcNk\leq \log^c N for some c>0c>0, the runtime is reduced to O(kpoly(N,1/ϵ))O(k\textrm{poly}(N,1/\epsilon)). With no restrictions on ϵ\epsilon, we have an approximation recovery system with m=O(k/ϵlog(N/k)((logN/logk)γ+1/ϵ))m = O(k/\epsilon \log(N/k)((\log N/\log k)^\gamma + 1/\epsilon)) measurements
    corecore