Principal components analysis (PCA) is a well-known technique for
approximating a tabular data set by a low rank matrix. Here, we extend the idea
of PCA to handle arbitrary data sets consisting of numerical, Boolean,
categorical, ordinal, and other data types. This framework encompasses many
well known techniques in data analysis, such as nonnegative matrix
factorization, matrix completion, sparse and robust PCA, k-means, k-SVD,
and maximum margin matrix factorization. The method handles heterogeneous data
sets, and leads to coherent schemes for compressing, denoising, and imputing
missing entries across all data types simultaneously. It also admits a number
of interesting interpretations of the low rank factors, which allow clustering
of examples or of features. We propose several parallel algorithms for fitting
generalized low rank models, and describe implementations and numerical
results.Comment: 84 pages, 19 figure