Randomized algorithms provide solutions to two ubiquitous problems: (1) the
distributed calculation of a principal component analysis or singular value
decomposition of a highly rectangular matrix, and (2) the distributed
calculation of a low-rank approximation (in the form of a singular value
decomposition) to an arbitrary matrix. Carefully honed algorithms yield results
that are uniformly superior to those of the stock, deterministic
implementations in Spark (the popular platform for distributed computation); in
particular, whereas the stock software will without warning return left
singular vectors that are far from numerically orthonormal, a significantly
burnished randomized implementation generates left singular vectors that are
numerically orthonormal to nearly the machine precision.Comment: 21 pages, 29 tables, 1 figure, 8 algorithms in pseudocod