    On Deterministic Sketching and Streaming for Sparse Recovery and Norm Estimation

    A proof that linf/l1 sparse recovery and inner product estimation are equivalent, and that incoherent matrices can be used to solve both problems. Our upper bound for the number of measurements is m=O(eps^{-2}*min{log n, (log n / log(1/eps))^2}). We can also obtain fast sketching and recovery algorithms by making use of the Fast Johnson-Lindenstrauss transform. Both our running times and number of measurements improve upon previous work. We can also obtain better error guarantees than previous work in terms of a smaller tail of the input vector. * A new lower bound for the number of linear measurements required to solve l1/l1 sparse recovery. We show Omega(k/eps^2 + klog(n/k)/eps) measurements are required to recover an x' with |x - x'|_1 <= (1+eps)|x_{tail(k)}|_1, where x_{tail(k)} is x projected onto all but its largest k coordinates in magnitude. * A tight bound of m = Theta(eps^{-2}log(eps^2 n)) on the number of measurements required to solve deterministic norm estimation, i.e., to recover |x|_2 +/- eps|x|_1. For all the problems we study, tight bounds are already known for the randomized complexity from previous work, except in the case of l1/l1 sparse recovery, where a nearly tight bound is known. Our work thus aims to study the deterministic complexities of these problems

    We study classic streaming and sparse recovery problems using deterministic linear sketches, including ℓ1/ℓ1\ell_1/\ell_1 and ℓ∞/ℓ1\ell_{\infty}/\ell_1 sparse recovery problems (the latter also being known as ℓ1ℓ1-heavy hitters), norm estimation, and approximate inner product. We focus on devising a fixed matrix AϵRm×nA \epsilon \mathbb{R}^{m \times n} and a deterministic recovery/estimation procedure which work for all possible input vectors simultaneously. Our results improve upon existing work, the following being our main contributions: • A proof that ℓ∞/ℓ1\ell_{\infty}/\ell_1 sparse recovery and inner product estimation are equivalent, and that incoherent matrices can be used to solve both problems. Our upper bound for the number of measurements is m=O(ε−2min{logn,(logn/log(1/ε))2})m=O(\varepsilon^{-2}min\{log n,(log n/log(1/\varepsilon))^2\}). We can also obtain fast sketching and recovery algorithms by making use of the Fast Johnson–Lindenstrauss transform. Both our running times and number of measurements improve upon previous work. We can also obtain better error guarantees than previous work in terms of a smaller tail of the input vector. • A new lower bound for the number of linear measurements required to solve ℓ1/ℓ1\ell_1/\ell_1 sparse recovery. We show Ω(k/ε2+klog(n/k)/ε)\Omega(k/\varepsilon^2+k log(n/k)/\varepsilon) measurements are required to recover an x′ with ‖x−x′‖1≤(1+ε)‖xtail(k)‖1‖x-x′‖_1\leq(1+\varepsilon)‖x_{tail(k)}‖_1, where xtail(k)x_{tail(k)} is x projected onto all but its largest k coordinates in magnitude. • A tight bound of m=θ(ε−2log(ε2n))m=\theta(\varepsilon^{-2}log(\varepsilon^2n)) on the number of measurements required to solve deterministic norm estimation, i.e., to recover ‖x‖2±ε‖x‖1‖x‖_2\pm\varepsilon‖x‖_1. For all the problems we study, tight bounds are already known for the randomized complexity from previous work, except in the case of ℓ1/ℓ1\ell_1/\ell_1 sparse recovery, where a nearly tight bound is known. Our work thus aims to study the deterministic complexities of these problems. We remark that some of the matrices used in our algorithms, although known to exist, currently are not yet explicit in the sense that deterministic polynomial time constructions are not yet known, although in all cases polynomial time Monte Carlo algorithms are known.

    Pseudo-Deterministic Streaming

    A pseudo-deterministic algorithm is a (randomized) algorithm which, when run multiple times on the same input, with high probability outputs the same result on all executions. Classic streaming algorithms, such as those for finding heavy hitters, approximate counting, ?_2 approximation, finding a nonzero entry in a vector (for turnstile algorithms) are not pseudo-deterministic. For example, in the instance of finding a nonzero entry in a vector, for any known low-space algorithm A, there exists a stream x so that running A twice on x (using different randomness) would with high probability result in two different entries as the output. In this work, we study whether it is inherent that these algorithms output different values on different executions. That is, we ask whether these problems have low-memory pseudo-deterministic algorithms. For instance, we show that there is no low-memory pseudo-deterministic algorithm for finding a nonzero entry in a vector (given in a turnstile fashion), and also that there is no low-dimensional pseudo-deterministic sketching algorithm for ?_2 norm estimation. We also exhibit problems which do have low memory pseudo-deterministic algorithms but no low memory deterministic algorithm, such as outputting a nonzero row of a matrix, or outputting a basis for the row-span of a matrix. We also investigate multi-pseudo-deterministic algorithms: algorithms which with high probability output one of a few options. We show the first lower bounds for such algorithms. This implies that there are streaming problems such that every low space algorithm for the problem must have inputs where there are many valid outputs, all with a significant probability of being outputted

    Lower Bounds for Sparse Recovery

    We consider the following k-sparse recovery problem: design an m x n matrix A, such that for any signal x, given Ax we can efficiently recover x' satisfying ||x-x'||_1 <= C min_{k-sparse} x"} ||x-x"||_1. It is known that there exist matrices A with this property that have only O(k log (n/k)) rows. In this paper we show that this bound is tight. Our bound holds even for the more general /randomized/ version of the problem, where A is a random variable and the recovery algorithm is required to work for any fixed x with constant probability (over A).

    Improved Algorithms for White-Box Adversarial Streams

    We study streaming algorithms in the white-box adversarial stream model, where the internal state of the streaming algorithm is revealed to an adversary who adaptively generates the stream updates, but the algorithm obtains fresh randomness unknown to the adversary at each time step. We incorporate cryptographic assumptions to construct robust algorithms against such adversaries. We propose efficient algorithms for sparse recovery of vectors, low rank recovery of matrices and tensors, as well as low rank plus sparse recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our algorithms can report when the input is not sparse or low rank even in the presence of such an adversary. We use these recovery algorithms to improve upon and solve new problems in numerical linear algebra and combinatorial optimization on white-box adversarial streams. For example, we give the first efficient algorithm for outputting a matching in a graph with insertions and deletions to its edges provided the matching size is small, and otherwise we declare the matching size is large. We also improve the approximation versus memory tradeoff of previous work for estimating the number of non-zero elements in a vector and computing the matrix rank.

    Deterministic Heavy Hitters with Sublinear Query Time

    We study the classic problem of finding l_1 heavy hitters in the streaming model. In the general turnstile model, we give the first deterministic sublinear-time sketching algorithm which takes a linear sketch of length O(epsilon^{-2} log n * log^*(epsilon^{-1})), which is only a factor of log^*(epsilon^{-1}) more than the best existing polynomial-time sketching algorithm (Nelson et al., RANDOM \u2712). Our approach is based on an iterative procedure, where most unrecovered heavy hitters are identified in each iteration. Although this technique has been extensively employed in the related problem of sparse recovery, this is the first time, to the best of our knowledge, that it has been used in the context of heavy hitters. Along the way we also obtain a sublinear time algorithm for the closely related problem of the l_1/l_1 compressed sensing, matching the space usage of previous (super-)linear time algorithms. In the strict turnstile model, we show that the runtime can be improved and the sketching matrix can be made strongly explicit with O(epsilon^{-2}log^3 n/log^3(1/epsilon)) rows
