180 research outputs found

    Lecture 14: Randomized Algorithms for Least Squares Problems

    Get PDF
    The emergence of massive data sets, over the past twenty or so years, has lead to the development of Randomized Numerical Linear Algebra. Randomized matrix algorithms perform random sketching and sampling of rows or columns, in order to reduce the problem dimension or compute low-rank approximations. We review randomized algorithms for the solution of least squares/regression problems, based on row sketching from the left, or column sketching from the right. These algorithms tend to be efficient and accurate on matrices that have many more rows than columns. We present probabilistic bounds for the amount of sampling required to achieve a user-specified error tolerance. Along the way we illustrate important concepts from numerical analysis (conditioning and pre-conditioning), probability (coherence, concentration inequalities), and statistics (sampling and leverage scores). Numerical experiments illustrate that the bounds are informative even for small problem dimensions and stringent success probabilities. To stress-test the bounds, we present algorithms that generate \u27adversarial\u27 matrices\u27 for user-specified coherence and leverage scores. If time permits, we discuss the additional effect of uncertainties from the underlying Gaussian linear model in a regression problem

    Conditioning of Leverage Scores and Computation by QR Decomposition

    Full text link
    The leverage scores of a full-column rank matrix A are the squared row norms of any orthonormal basis for range(A). We show that corresponding leverage scores of two matrices A and A + \Delta A are close in the relative sense, if they have large magnitude and if all principal angles between the column spaces of A and A + \Delta A are small. We also show three classes of bounds that are based on perturbation results of QR decompositions. They demonstrate that relative differences between individual leverage scores strongly depend on the particular type of perturbation \Delta A. The bounds imply that the relative accuracy of an individual leverage score depends on: its magnitude and the two-norm condition of A, if \Delta A is a general perturbation; the two-norm condition number of A, if \Delta A is a perturbation with the same norm-wise row-scaling as A; (to first order) neither condition number nor leverage score magnitude, if \Delta A is a component-wise row-scaled perturbation. Numerical experiments confirm the qualitative and quantitative accuracy of our bounds.Comment: This version has been accepted to SIMAX but has not yet gone through copy editin
    corecore