664 research outputs found
Overlearning in marginal distribution-based ICA: analysis and solutions
The present paper is written as a word of caution, with users of
independent component analysis (ICA) in mind, to overlearning
phenomena that are often observed.\\
We consider two types of overlearning, typical to high-order
statistics based ICA. These algorithms can be seen to maximise the
negentropy of the source estimates. The first kind of overlearning
results in the generation of spike-like signals, if there are not
enough samples in the data or there is a considerable amount of
noise present. It is argued that, if the data has power spectrum
characterised by curve, we face a more severe problem, which
cannot be solved inside the strict ICA model. This overlearning is
better characterised by bumps instead of spikes. Both overlearning
types are demonstrated in the case of artificial signals as well as
magnetoencephalograms (MEG). Several methods are suggested to
circumvent both types, either by making the estimation of the ICA
model more robust or by including further modelling of the data
Rethinking LDA: moment matching for discrete ICA
We consider moment matching techniques for estimation in Latent Dirichlet
Allocation (LDA). By drawing explicit links between LDA and discrete versions
of independent component analysis (ICA), we first derive a new set of
cumulant-based tensors, with an improved sample complexity. Moreover, we reuse
standard ICA techniques such as joint diagonalization of tensors to improve
over existing methods based on the tensor power method. In an extensive set of
experiments on both synthetic and real datasets, we show that our new
combination of tensors and orthogonal joint diagonalization techniques
outperforms existing moment matching methods.Comment: 30 pages; added plate diagrams and clarifications, changed style,
corrected typos, updated figures. in Proceedings of the 29-th Conference on
Neural Information Processing Systems (NIPS), 201
Heavy-tailed Independent Component Analysis
Independent component analysis (ICA) is the problem of efficiently recovering
a matrix from i.i.d. observations of
where is a random vector with mutually independent
coordinates. This problem has been intensively studied, but all existing
efficient algorithms with provable guarantees require that the coordinates
have finite fourth moments. We consider the heavy-tailed ICA problem
where we do not make this assumption, about the second moment. This problem
also has received considerable attention in the applied literature. In the
present work, we first give a provably efficient algorithm that works under the
assumption that for constant , each has finite
-moment, thus substantially weakening the moment requirement
condition for the ICA problem to be solvable. We then give an algorithm that
works under the assumption that matrix has orthogonal columns but requires
no moment assumptions. Our techniques draw ideas from convex geometry and
exploit standard properties of the multivariate spherical Gaussian distribution
in a novel way.Comment: 30 page
Fourier PCA and Robust Tensor Decomposition
Fourier PCA is Principal Component Analysis of a matrix obtained from higher
order derivatives of the logarithm of the Fourier transform of a
distribution.We make this method algorithmic by developing a tensor
decomposition method for a pair of tensors sharing the same vectors in rank-
decompositions. Our main application is the first provably polynomial-time
algorithm for underdetermined ICA, i.e., learning an matrix
from observations where is drawn from an unknown product
distribution with arbitrary non-Gaussian components. The number of component
distributions can be arbitrarily higher than the dimension and the
columns of only need to satisfy a natural and efficiently verifiable
nondegeneracy condition. As a second application, we give an alternative
algorithm for learning mixtures of spherical Gaussians with linearly
independent means. These results also hold in the presence of Gaussian noise.Comment: Extensively revised; details added; minor errors corrected;
exposition improve
- …