Fast Monte Carlo Algorithms for Computing a Low-Rank Approximation to a Matrix

Abstract

Many of today\u27s applications deal with big quantities of data; from DNA analysis algorithms, to image processing and movie recommendation algorithms. Most of these systems store the data in very large matrices. In order to perform analysis on the collected data, these big matrices have to be stored in the RAM (random-access memory) of the computing system. But this is a very expensive process since RAM is a scarce computational resource. Ideally, one would like to be able to store most of the data matrices on the memory disk (hard disk drive) while loading only the necessary parts of the data in the RAM. In order to do so, the data matrix has to be decomposed into smaller matrices. Singular value decomposition (SVD) is an algorithm that can be used to find a low-rank approximation of the input matrix, creating thus an approximation of smaller sizes. Methods like SVD require memory and time that are super-linear (increase at a rate higher than linear) in the sizes of the input matrix. This constraint is a burden for many of the applications that analyze large quantities of data. In this thesis we are presenting a more efficient algorithm based on Monte Carlo methods, LinearTimeSVD, that achieves a low-rank approximation of the input matrix while maintaining memory and time requirements that are only linear in regards to the sizes of the original matrix. Moreover, we will prove that the errors associated to this new construction method are bounded in terms of properties of the input matrix. The main idea behind the algorithm is a sampling step that will construct a lower size matrix from a subset of the columns of the input matrix. Using SVD on this new matrix (that has a constant number of columns with respect to the sizes of the input matrix), the method presented will generate approximations of the top k singular values and corresponding singular vectors of A, where k will denote the rank of the approximated matrix. By sampling enough columns, it can be shown that the approximation error can be decreased

    Similar works