22 research outputs found
Provable ICA with Unknown Gaussian Noise, and Implications for Gaussian Mixtures and Autoencoders
We present a new algorithm for Independent Component Analysis (ICA) which has
provable performance guarantees. In particular, suppose we are given samples of
the form where is an unknown matrix and is
a random variable whose components are independent and have a fourth moment
strictly less than that of a standard Gaussian random variable and is an
-dimensional Gaussian random variable with unknown covariance : We
give an algorithm that provable recovers and up to an additive
and whose running time and sample complexity are polynomial in
and . To accomplish this, we introduce a novel "quasi-whitening"
step that may be useful in other contexts in which the covariance of Gaussian
noise is not known in advance. We also give a general framework for finding all
local optima of a function (given an oracle for approximately finding just one)
and this is a crucial step in our algorithm, one that has been overlooked in
previous attempts, and allows us to control the accumulation of error when we
find the columns of one by one via local search
Non-Gaussian Component Analysis using Entropy Methods
Non-Gaussian component analysis (NGCA) is a problem in multidimensional data
analysis which, since its formulation in 2006, has attracted considerable
attention in statistics and machine learning. In this problem, we have a random
variable in -dimensional Euclidean space. There is an unknown subspace
of the -dimensional Euclidean space such that the orthogonal
projection of onto is standard multidimensional Gaussian and the
orthogonal projection of onto , the orthogonal complement
of , is non-Gaussian, in the sense that all its one-dimensional
marginals are different from the Gaussian in a certain metric defined in terms
of moments. The NGCA problem is to approximate the non-Gaussian subspace
given samples of .
Vectors in correspond to `interesting' directions, whereas
vectors in correspond to the directions where data is very noisy. The
most interesting applications of the NGCA model is for the case when the
magnitude of the noise is comparable to that of the true signal, a setting in
which traditional noise reduction techniques such as PCA don't apply directly.
NGCA is also related to dimension reduction and to other data analysis problems
such as ICA. NGCA-like problems have been studied in statistics for a long time
using techniques such as projection pursuit.
We give an algorithm that takes polynomial time in the dimension and has
an inverse polynomial dependence on the error parameter measuring the angle
distance between the non-Gaussian subspace and the subspace output by the
algorithm. Our algorithm is based on relative entropy as the contrast function
and fits under the projection pursuit framework. The techniques we develop for
analyzing our algorithm maybe of use for other related problems
Smoothed Analysis of Tensor Decompositions
Low rank tensor decompositions are a powerful tool for learning generative
models, and uniqueness results give them a significant advantage over matrix
decomposition methods. However, tensors pose significant algorithmic challenges
and tensors analogs of much of the matrix algebra toolkit are unlikely to exist
because of hardness results. Efficient decomposition in the overcomplete case
(where rank exceeds dimension) is particularly challenging. We introduce a
smoothed analysis model for studying these questions and develop an efficient
algorithm for tensor decomposition in the highly overcomplete case (rank
polynomial in the dimension). In this setting, we show that our algorithm is
robust to inverse polynomial error -- a crucial property for applications in
learning since we are only allowed a polynomial number of samples. While
algorithms are known for exact tensor decomposition in some overcomplete
settings, our main contribution is in analyzing their stability in the
framework of smoothed analysis.
Our main technical contribution is to show that tensor products of perturbed
vectors are linearly independent in a robust sense (i.e. the associated matrix
has singular values that are at least an inverse polynomial). This key result
paves the way for applying tensor methods to learning problems in the smoothed
setting. In particular, we use it to obtain results for learning multi-view
models and mixtures of axis-aligned Gaussians where there are many more
"components" than dimensions. The assumption here is that the model is not
adversarially chosen, formalized by a perturbation of model parameters. We
believe this an appealing way to analyze realistic instances of learning
problems, since this framework allows us to overcome many of the usual
limitations of using tensor methods.Comment: 32 pages (including appendix
Fourier PCA and Robust Tensor Decomposition
Fourier PCA is Principal Component Analysis of a matrix obtained from higher
order derivatives of the logarithm of the Fourier transform of a
distribution.We make this method algorithmic by developing a tensor
decomposition method for a pair of tensors sharing the same vectors in rank-
decompositions. Our main application is the first provably polynomial-time
algorithm for underdetermined ICA, i.e., learning an matrix
from observations where is drawn from an unknown product
distribution with arbitrary non-Gaussian components. The number of component
distributions can be arbitrarily higher than the dimension and the
columns of only need to satisfy a natural and efficiently verifiable
nondegeneracy condition. As a second application, we give an alternative
algorithm for learning mixtures of spherical Gaussians with linearly
independent means. These results also hold in the presence of Gaussian noise.Comment: Extensively revised; details added; minor errors corrected;
exposition improve
Some Algorithms and Paradigms for Big Data
The reality of big data poses both opportunities and challenges to modern researchers. Its key features -- large sample sizes, high-dimensional feature spaces, and structural complexity -- enforce new paradigms upon the creation of effective yet algorithmic efficient data analysis algorithms. In this dissertation, we illustrate a few paradigms through the analysis of three new algorithms. The first two algorithms consider the problem of phase retrieval, in which we seek to recover a signal from random rank-one quadratic measurements. We first show that an adaptation of the randomized Kaczmarz method provably exhibits linear convergence so long as our sample size is linear in the signal dimension. Next, we show that the standard SDP relaxation of sparse PCA yields an algorithm that does signal recovery for sparse, model-misspecified phase retrieval with a sample complexity that scales according to the square of the sparsity parameter. Finally, our third algorithm addresses the problem of Non-Gaussian Component Analysis, in which we are trying to identify the non-Gaussian marginals of a high-dimensional distribution. We prove that our algorithm exhibits polynomial time convergence with polynomial sample complexity.PHDMathematicsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145895/1/yanshuo_1.pd
Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time
We give a polynomial-time algorithm for learning high-dimensional halfspaces
with margins in -dimensional space to within desired TV distance when the
ambient distribution is an unknown affine transformation of the -fold
product of an (unknown) symmetric one-dimensional logconcave distribution, and
the halfspace is introduced by deleting at least an fraction of the
data in one of the component distributions. Notably, our algorithm does not
need labels and establishes the unique (and efficient) identifiability of the
hidden halfspace under this distributional assumption. The sample and time
complexity of the algorithm are polynomial in the dimension and .
The algorithm uses only the first two moments of suitable re-weightings of the
empirical distribution, which we call contrastive moments; its analysis uses
classical facts about generalized Dirichlet polynomials and relies crucially on
a new monotonicity property of the moment ratio of truncations of logconcave
distributions. Such algorithms, based only on first and second moments were
suggested in earlier work, but hitherto eluded rigorous guarantees.
Prior work addressed the special case when the underlying distribution is
Gaussian via Non-Gaussian Component Analysis. We improve on this by providing
polytime guarantees based on Total Variation (TV) distance, in place of
existing moment-bound guarantees that can be super-polynomial. Our work is also
the first to go beyond Gaussians in this setting.Comment: Preliminary version in NeurIPS 202
Basis Learning as an Algorithmic Primitive *
Abstract A number of important problems in theoretical computer science and machine learning can be interpreted as recovering a certain basis. These include symmetric matrix eigendecomposition, certain tensor decompositions, Independent Component Analysis (ICA), spectral clustering and Gaussian mixture learning. Each of these problems reduces to an instance of our general model, which we call a "Basis Encoding Function" (BEF). We show that learning a basis within this model can then be provably and efficiently achieved using a first order iteration algorithm (gradient iteration). Our algorithm goes beyond tensor methods while generalizing a number of existing algorithms-e.g., the power method for symmetric matrices, the tensor power iteration for orthogonal decomposable tensors, and cumulant-based FastICA-all within a broader function-based dynamical systems framework. Our framework also unifies the unusual phenomenon observed in these domains that they can be solved using efficient non-convex optimization. Specifically, we describe a class of BEFs such that their local maxima on the unit sphere are in one-to-one correspondence with the basis elements. This description relies on a certain "hidden convexity" property of these functions. We provide a complete theoretical analysis of the gradient iteration even when the BEF is perturbed. We show convergence and complexity bounds polynomial in dimension and other relevant parameters, such as perturbation size. Our perturbation results can be considered as a nonlinear version of the classical Davis-Kahan theorem for perturbations of eigenvectors of symmetric matrices. In addition we show that our algorithm exhibits fast (superlinear) convergence and relate the speed of convergence to the properties of the BEF. Moreover, the gradient iteration algorithm can be easily and efficiently implemented in practice