93 research outputs found

    Low-distortion Subspace Embeddings in Input-sparsity Time and Applications to Robust Linear Regression

    Full text link
    Low-distortion embeddings are critical building blocks for developing random sampling and random projection algorithms for linear algebra problems. We show that, given a matrix ARn×dA \in \R^{n \times d} with ndn \gg d and a p[1,2)p \in [1, 2), with a constant probability, we can construct a low-distortion embedding matrix \Pi \in \R^{O(\poly(d)) \times n} that embeds \A_p, the p\ell_p subspace spanned by AA's columns, into (\R^{O(\poly(d))}, \| \cdot \|_p); the distortion of our embeddings is only O(\poly(d)), and we can compute ΠA\Pi A in O(\nnz(A)) time, i.e., input-sparsity time. Our result generalizes the input-sparsity time 2\ell_2 subspace embedding by Clarkson and Woodruff [STOC'13]; and for completeness, we present a simpler and improved analysis of their construction for 2\ell_2. These input-sparsity time p\ell_p embeddings are optimal, up to constants, in terms of their running time; and the improved running time propagates to applications such as (1±ϵ)(1\pm \epsilon)-distortion p\ell_p subspace embedding and relative-error p\ell_p regression. For 2\ell_2, we show that a (1+ϵ)(1+\epsilon)-approximate solution to the 2\ell_2 regression problem specified by the matrix AA and a vector bRnb \in \R^n can be computed in O(\nnz(A) + d^3 \log(d/\epsilon) /\epsilon^2) time; and for p\ell_p, via a subspace-preserving sampling procedure, we show that a (1±ϵ)(1\pm \epsilon)-distortion embedding of \A_p into \R^{O(\poly(d))} can be computed in O(\nnz(A) \cdot \log n) time, and we also show that a (1+ϵ)(1+\epsilon)-approximate solution to the p\ell_p regression problem minxRdAxbp\min_{x \in \R^d} \|A x - b\|_p can be computed in O(\nnz(A) \cdot \log n + \poly(d) \log(1/\epsilon)/\epsilon^2) time. Moreover, we can improve the embedding dimension or equivalently the sample size to O(d3+p/2log(1/ϵ)/ϵ2)O(d^{3+p/2} \log(1/\epsilon) / \epsilon^2) without increasing the complexity.Comment: 22 page

    Sketching via hashing: from heavy hitters to compressed sensing to sparse fourier transform

    Get PDF
    Sketching via hashing is a popular and useful method for processing large data sets. Its basic idea is as follows. Suppose that we have a large multi-set of elements S=[formula], and we would like to identify the elements that occur “frequently" in S. The algorithm starts by selecting a hash function h that maps the elements into an array c[1…m]. The array entries are initialized to 0. Then, for each element a ∈ S, the algorithm increments c[h(a)]. At the end of the process, each array entry c[j] contains the count of all data elements a ∈ S mapped to j

    Tight Bounds for Sketching the Operator Norm, Schatten Norms, and Subspace Embeddings

    Get PDF
    We consider the following oblivious sketching problem: given epsilon in (0,1/3) and n >= d/epsilon^2, design a distribution D over R^{k * nd} and a function f: R^k * R^{nd} -> R}, so that for any n * d matrix A, Pr_{S sim D} [(1-epsilon) |A|_{op} = 2/3, where |A|_{op} = sup_{x:|x|_2 = 1} |Ax|_2 is the operator norm of A and S(A) denotes S * A, interpreting A as a vector in R^{nd}. We show a tight lower bound of k = Omega(d^2/epsilon^2) for this problem. Previously, Nelson and Nguyen (ICALP, 2014) considered the problem of finding a distribution D over R^{k * n} such that for any n * d matrix A, Pr_{S sim D}[forall x, (1-epsilon)|Ax|_2 = 2/3, which is called an oblivious subspace embedding (OSE). Our result considerably strengthens theirs, as it (1) applies only to estimating the operator norm, which can be estimated given any OSE, and (2) applies to distributions over general linear operators S which treat A as a vector and compute S(A), rather than the restricted class of linear operators corresponding to matrix multiplication. Our technique also implies the first tight bounds for approximating the Schatten p-norm for even integers p via general linear sketches, improving the previous lower bound from k = Omega(n^{2-6/p}) [Regev, 2014] to k = Omega(n^{2-4/p}). Importantly, for sketching the operator norm up to a factor of alpha, where alpha - 1 = Omega(1), we obtain a tight k = Omega(n^2/alpha^4) bound, matching the upper bound of Andoni and Nguyen (SODA, 2013), and improving the previous k = Omega(n^2/alpha^6) lower bound. Finally, we also obtain the first lower bounds for approximating Ky Fan norms

    Online Row Sampling

    Get PDF
    Finding a small spectral approximation for a tall n×dn \times d matrix AA is a fundamental numerical primitive. For a number of reasons, one often seeks an approximation whose rows are sampled from those of AA. Row sampling improves interpretability, saves space when AA is sparse, and preserves row structure, which is especially important, for example, when AA represents a graph. However, correctly sampling rows from AA can be costly when the matrix is large and cannot be stored and processed in memory. Hence, a number of recent publications focus on row sampling in the streaming setting, using little more space than what is required to store the outputted approximation [KL13, KLM+14]. Inspired by a growing body of work on online algorithms for machine learning and data analysis, we extend this work to a more restrictive online setting: we read rows of AA one by one and immediately decide whether each row should be kept in the spectral approximation or discarded, without ever retracting these decisions. We present an extremely simple algorithm that approximates AA up to multiplicative error ϵ\epsilon and additive error δ\delta using O(dlogdlog(ϵA2/δ)/ϵ2)O(d \log d \log(\epsilon||A||_2/\delta)/\epsilon^2) online samples, with memory overhead proportional to the cost of storing the spectral approximation. We also present an algorithm that uses O(d2O(d^2) memory but only requires O(dlog(ϵA2/δ)/ϵ2)O(d\log(\epsilon||A||_2/\delta)/\epsilon^2) samples, which we show is optimal. Our methods are clean and intuitive, allow for lower memory usage than prior work, and expose new theoretical properties of leverage score based matrix approximation

    Lower Bounds for Oblivious Subspace Embeddings

    Get PDF
    An oblivious subspace embedding (OSE) for some ϵ\epsilon,δ(0,1/3)\delta \in (0,1/3) and d ≤ m ≤ n is a distribution D\mathcal{D} over Rm×n\mathbb{R}^{m×n} such that PrΠD(xW,(1ϵ)x2Πx2(1+ϵ)x2)1δ\underset {\Pi \sim \mathcal{D}} {Pr} (\forall x \in W, (1−\epsilon)|| x ||_2≤ || \Pi x ||_2≤(1+\epsilon)|| x||_2)≥1−\delta for any linear subspace WRnW \subset \mathbb{R}^n of dimension d. We prove any OSE with δ<1/3\delta < 1/3 has m=Ω((d+log(1/δ))/ϵ2)m = \Omega((d + log(1/\delta))/\epsilon^2), which is optimal. Furthermore, if every Π\Pi in the support of D\mathcal{D} is sparse, having at most s non-zero entries per column, we show tradeoff lower bounds between m and s.Engineering and Applied Science
    corecore