180 research outputs found
Lecture 14: Randomized Algorithms for Least Squares Problems
The emergence of massive data sets, over the past twenty or so years, has lead to the development of Randomized Numerical Linear Algebra. Randomized matrix algorithms perform random sketching and sampling of rows or columns, in order to reduce the problem dimension or compute low-rank approximations. We review randomized algorithms for the solution of least squares/regression problems, based on row sketching from the left, or column sketching from the right. These algorithms tend to be efficient and accurate on matrices that have many more rows than columns. We present probabilistic bounds for the amount of sampling required to achieve a user-specified error tolerance. Along the way we illustrate important concepts from numerical analysis (conditioning and pre-conditioning), probability (coherence, concentration inequalities), and statistics (sampling and leverage scores). Numerical experiments illustrate that the bounds are informative even for small problem dimensions and stringent success probabilities. To stress-test the bounds, we present algorithms that generate \u27adversarial\u27 matrices\u27 for user-specified coherence and leverage scores. If time permits, we discuss the additional effect of uncertainties from the underlying Gaussian linear model in a regression problem
Conditioning of Leverage Scores and Computation by QR Decomposition
The leverage scores of a full-column rank matrix A are the squared row norms
of any orthonormal basis for range(A). We show that corresponding leverage
scores of two matrices A and A + \Delta A are close in the relative sense, if
they have large magnitude and if all principal angles between the column spaces
of A and A + \Delta A are small. We also show three classes of bounds that are
based on perturbation results of QR decompositions. They demonstrate that
relative differences between individual leverage scores strongly depend on the
particular type of perturbation \Delta A. The bounds imply that the relative
accuracy of an individual leverage score depends on: its magnitude and the
two-norm condition of A, if \Delta A is a general perturbation; the two-norm
condition number of A, if \Delta A is a perturbation with the same norm-wise
row-scaling as A; (to first order) neither condition number nor leverage score
magnitude, if \Delta A is a component-wise row-scaled perturbation. Numerical
experiments confirm the qualitative and quantitative accuracy of our bounds.Comment: This version has been accepted to SIMAX but has not yet gone through
copy editin
- …