1 research outputs found

    Fast Regression with an ellinftyell_infty Guarantee

    No full text
    Sketching has emerged as a powerful technique for speeding up problems in numerical linear algebra, such as regression. In the overconstrained regression problem, one is given an n x d matrix A, with n >> d, as well as an n x 1 vector b, and one wants to find a vector hat{x} so as to minimize the residual error ||Ax-b||_2. Using the sketch and solve paradigm, one first computes S cdot A and S cdot b for a randomly chosen matrix S, then outputs x\u27 = (SA)^{dagger} Sb so as to minimize || SAx\u27 - Sb||_2. The sketch-and-solve paradigm gives a bound on ||x\u27-x^*||_2 when A is well-conditioned. Our main result is that, when S is the subsampled randomized Fourier/Hadamard transform, the error x\u27 - x^* behaves as if it lies in a "random" direction within this bound: for any fixed direction a in R^d, we have with 1 - d^{-c} probability that (1) langle a, x\u27-x^* rangle lesssim frac{ |a|_2|x\u27-x^*|_2}{d^{frac{1}{2}-gamma}}, where c, gamma > 0 are arbitrary constants. This implies ||x\u27-x^*||_{infty} is a factor d^{frac{1}{2}-gamma} smaller than ||x\u27-x^*||_2. It also gives a better bound on the generalization of x\u27 to new examples: if rows of A correspond to examples and columns to features, then our result gives a better bound for the error introduced by sketch-and-solve when classifying fresh examples. We show that not all oblivious subspace embeddings S satisfy these properties. In particular, we give counterexamples showing that matrices based on Count-Sketch or leverage score sampling do not satisfy these properties. We also provide lower bounds, both on how small ||x\u27-x^*||_2 can be, and for our new guarantee (1), showing that the subsampled randomized Fourier/Hadamard transform is nearly optimal. Our lower bound on ||x\u27-x^*||_2 shows that there is an O(1/epsilon) separation in the dimension of the optimal oblivious subspace embedding required for outputting an x\u27 for which ||x\u27-x^*||_2 <= epsilon ||Ax^*-b||_2 cdot ||A^{dagger}||_2$, compared to the dimension of the optimal oblivious subspace embedding required for outputting an x\u27 for which ||Ax\u27-b||_2 <= (1+epsilon)||Ax^*-b||_2, that is, the former problem requires dimension Omega(d/epsilon^2) while the latter problem can be solved with dimension O(d/epsilon). This explains the reason known upper bounds on the dimensions of these two variants of regression have differed in prior work
    corecore