65 research outputs found

    Fourier PCA and Robust Tensor Decomposition

    Full text link
    Fourier PCA is Principal Component Analysis of a matrix obtained from higher order derivatives of the logarithm of the Fourier transform of a distribution.We make this method algorithmic by developing a tensor decomposition method for a pair of tensors sharing the same vectors in rank-11 decompositions. Our main application is the first provably polynomial-time algorithm for underdetermined ICA, i.e., learning an n×mn \times m matrix AA from observations y=Axy=Ax where xx is drawn from an unknown product distribution with arbitrary non-Gaussian components. The number of component distributions mm can be arbitrarily higher than the dimension nn and the columns of AA only need to satisfy a natural and efficiently verifiable nondegeneracy condition. As a second application, we give an alternative algorithm for learning mixtures of spherical Gaussians with linearly independent means. These results also hold in the presence of Gaussian noise.Comment: Extensively revised; details added; minor errors corrected; exposition improve

    Max vs Min: Tensor Decomposition and ICA with nearly Linear Sample Complexity

    Get PDF
    We present a simple, general technique for reducing the sample complexity of matrix and tensor decomposition algorithms applied to distributions. We use the technique to give a polynomial-time algorithm for standard ICA with sample complexity nearly linear in the dimension, thereby improving substantially on previous bounds. The analysis is based on properties of random polynomials, namely the spacings of an ensemble of polynomials. Our technique also applies to other applications of tensor decompositions, including spherical Gaussian mixture models

    Heavy-tailed Independent Component Analysis

    Full text link
    Independent component analysis (ICA) is the problem of efficiently recovering a matrix ARn×nA \in \mathbb{R}^{n\times n} from i.i.d. observations of X=ASX=AS where SRnS \in \mathbb{R}^n is a random vector with mutually independent coordinates. This problem has been intensively studied, but all existing efficient algorithms with provable guarantees require that the coordinates SiS_i have finite fourth moments. We consider the heavy-tailed ICA problem where we do not make this assumption, about the second moment. This problem also has received considerable attention in the applied literature. In the present work, we first give a provably efficient algorithm that works under the assumption that for constant γ>0\gamma > 0, each SiS_i has finite (1+γ)(1+\gamma)-moment, thus substantially weakening the moment requirement condition for the ICA problem to be solvable. We then give an algorithm that works under the assumption that matrix AA has orthogonal columns but requires no moment assumptions. Our techniques draw ideas from convex geometry and exploit standard properties of the multivariate spherical Gaussian distribution in a novel way.Comment: 30 page

    Probabilistic Neural Network based Approach for Handwritten Character Recognition

    Get PDF
    In this paper, recognition system for totally unconstrained handwritten characters for south Indian language of Kannada is proposed. The proposed feature extraction technique is based on Fourier Transform and well known Principal Component Analysis (PCA). The system trains the appropriate frequency band images followed by PCA feature extraction scheme. For subsequent classification technique, Probabilistic Neural Network (PNN) is used. The proposed system is tested on large database containing Kannada characters and also tested on standard COIL-20 object database and the results were found to be better compared to standard techniques

    Overcomplete Independent Component Analysis via SDP

    Full text link
    We present a novel algorithm for overcomplete independent components analysis (ICA), where the number of latent sources k exceeds the dimension p of observed variables. Previous algorithms either suffer from high computational complexity or make strong assumptions about the form of the mixing matrix. Our algorithm does not make any sparsity assumption yet enjoys favorable computational and theoretical properties. Our algorithm consists of two main steps: (a) estimation of the Hessians of the cumulant generating function (as opposed to the fourth and higher order cumulants used by most algorithms) and (b) a novel semi-definite programming (SDP) relaxation for recovering a mixing component. We show that this relaxation can be efficiently solved with a projected accelerated gradient descent method, which makes the whole algorithm computationally practical. Moreover, we conjecture that the proposed program recovers a mixing component at the rate k < p^2/4 and prove that a mixing component can be recovered with high probability when k < (2 - epsilon) p log p when the original components are sampled uniformly at random on the hyper sphere. Experiments are provided on synthetic data and the CIFAR-10 dataset of real images.Comment: Appears in: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019). 21 page