Standard rank-revealing factorizations such as the singular value
decomposition and column pivoted QR factorization are challenging to implement
efficiently on a GPU. A major difficulty in this regard is the inability of
standard algorithms to cast most operations in terms of the Level-3 BLAS. This
paper presents two alternative algorithms for computing a rank-revealing
factorization of the form A=UTVβ, where U and V are orthogonal and
T is triangular. Both algorithms use randomized projection techniques to cast
most of the flops in terms of matrix-matrix multiplication, which is
exceptionally efficient on the GPU. Numerical experiments illustrate that these
algorithms achieve an order of magnitude acceleration over finely tuned GPU
implementations of the SVD while providing low-rank approximation errors close
to that of the SVD