211,051 research outputs found
Principal Component Analysis and Higher Correlations for Distributed Data
We consider algorithmic problems in the setting in which the input data has
been partitioned arbitrarily on many servers. The goal is to compute a function
of all the data, and the bottleneck is the communication used by the algorithm.
We present algorithms for two illustrative problems on massive data sets: (1)
computing a low-rank approximation of a matrix ,
with matrix stored on server and (2) computing a function of a vector
, where server has the vector ; this
includes the well-studied special case of computing frequency moments and
separable functions, as well as higher-order correlations such as the number of
subgraphs of a specified type occurring in a graph. For both problems we give
algorithms with nearly optimal communication, and in particular the only
dependence on , the size of the data, is in the number of bits needed to
represent indices and words ().Comment: rewritten with focus on two main results (distributed PCA,
higher-order moments and correlations) in the arbitrary partition mode
A Tutorial on Independent Component Analysis
Independent component analysis (ICA) has become a standard data analysis
technique applied to an array of problems in signal processing and machine
learning. This tutorial provides an introduction to ICA based on linear algebra
formulating an intuition for ICA from first principles. The goal of this
tutorial is to provide a solid foundation on this advanced topic so that one
might learn the motivation behind ICA, learn why and when to apply this
technique and in the process gain an introduction to this exciting field of
active research
- β¦