428 research outputs found
List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians
We study the problem of list-decodable Gaussian mean estimation and the
related problem of learning mixtures of separated spherical Gaussians. We
develop a set of techniques that yield new efficient algorithms with
significantly improved guarantees for these problems.
{\bf List-Decodable Mean Estimation.} Fix any and . We design an algorithm with runtime that outputs a list of many
candidate vectors such that with high probability one of the candidates is
within -distance from the true mean. The only
previous algorithm for this problem achieved error
under second moment conditions. For , our algorithm runs in
polynomial time and achieves error . We also give a
Statistical Query lower bound suggesting that the complexity of our algorithm
is qualitatively close to best possible.
{\bf Learning Mixtures of Spherical Gaussians.} We give a learning algorithm
for mixtures of spherical Gaussians that succeeds under significantly weaker
separation assumptions compared to prior work. For the prototypical case of a
uniform mixture of identity covariance Gaussians we obtain: For any
, if the pairwise separation between the means is at least
, our algorithm learns the unknown
parameters within accuracy with sample complexity and running time
. The previously best
known polynomial time algorithm required separation at least .
Our main technical contribution is a new technique, using degree-
multivariate polynomials, to remove outliers from high-dimensional datasets
where the majority of the points are corrupted
Sample Complexity Analysis for Learning Overcomplete Latent Variable Models through Tensor Methods
We provide guarantees for learning latent variable models emphasizing on the
overcomplete regime, where the dimensionality of the latent space can exceed
the observed dimensionality. In particular, we consider multiview mixtures,
spherical Gaussian mixtures, ICA, and sparse coding models. We provide tight
concentration bounds for empirical moments through novel covering arguments. We
analyze parameter recovery through a simple tensor power update algorithm. In
the semi-supervised setting, we exploit the label or prior information to get a
rough estimate of the model parameters, and then refine it using the tensor
method on unlabeled samples. We establish that learning is possible when the
number of components scales as , where is the observed
dimension, and is the order of the observed moment employed in the tensor
method. Our concentration bound analysis also leads to minimax sample
complexity for semi-supervised learning of spherical Gaussian mixtures. In the
unsupervised setting, we use a simple initialization algorithm based on SVD of
the tensor slices, and provide guarantees under the stricter condition that
(where constant can be larger than ), where the
tensor method recovers the components under a polynomial running time (and
exponential in ). Our analysis establishes that a wide range of
overcomplete latent variable models can be learned efficiently with low
computational and sample complexity through tensor decomposition methods.Comment: Title change
Score Function Features for Discriminative Learning: Matrix and Tensor Framework
Feature learning forms the cornerstone for tackling challenging learning
problems in domains such as speech, computer vision and natural language
processing. In this paper, we consider a novel class of matrix and
tensor-valued features, which can be pre-trained using unlabeled samples. We
present efficient algorithms for extracting discriminative information, given
these pre-trained features and labeled samples for any related task. Our class
of features are based on higher-order score functions, which capture local
variations in the probability density function of the input. We establish a
theoretical framework to characterize the nature of discriminative information
that can be extracted from score-function features, when used in conjunction
with labeled samples. We employ efficient spectral decomposition algorithms (on
matrices and tensors) for extracting discriminative components. The advantage
of employing tensor-valued features is that we can extract richer
discriminative information in the form of an overcomplete representations.
Thus, we present a novel framework for employing generative models of the input
for discriminative learning.Comment: 29 page
Beyond the Low-Degree Algorithm: Mixtures of Subcubes and Their Applications
We introduce the problem of learning mixtures of subcubes over
, which contains many classic learning theory problems as a special
case (and is itself a special case of others). We give a surprising -time learning algorithm based on higher-order multilinear moments. It is
not possible to learn the parameters because the same distribution can be
represented by quite different models. Instead, we develop a framework for
reasoning about how multilinear moments can pinpoint essential features of the
mixture, like the number of components.
We also give applications of our algorithm to learning decision trees with
stochastic transitions (which also capture interesting scenarios where the
transitions are deterministic but there are latent variables). Using our
algorithm for learning mixtures of subcubes, we can approximate the Bayes
optimal classifier within additive error on -leaf decision trees
with at most stochastic transitions on any root-to-leaf path in time. In this stochastic setting, the
classic Occam algorithms for learning decision trees with zero stochastic
transitions break down, while the low-degree algorithm of Linial et al.
inherently has a quasipolynomial dependence on .
In contrast, as we will show, mixtures of subcubes are uniquely
determined by their degree moments and hence provide a useful
abstraction for simultaneously achieving the polynomial dependence on
of the classic Occam algorithms for decision trees and the
flexibility of the low-degree algorithm in being able to accommodate stochastic
transitions. Using our multilinear moment techniques, we also give the first
improved upper and lower bounds since the work of Feldman et al. for the
related but harder problem of learning mixtures of binary product
distributions.Comment: 62 pages; to appear in STOC 201
Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods
Training neural networks is a challenging non-convex optimization problem,
and backpropagation or gradient descent can get stuck in spurious local optima.
We propose a novel algorithm based on tensor decomposition for guaranteed
training of two-layer neural networks. We provide risk bounds for our proposed
method, with a polynomial sample complexity in the relevant parameters, such as
input dimension and number of neurons. While learning arbitrary target
functions is NP-hard, we provide transparent conditions on the function and the
input for learnability. Our training method is based on tensor decomposition,
which provably converges to the global optimum, under a set of mild
non-degeneracy conditions. It consists of simple embarrassingly parallel linear
and multi-linear operations, and is competitive with standard stochastic
gradient descent (SGD), in terms of computational complexity. Thus, we propose
a computationally efficient method with guaranteed risk bounds for training
neural networks with one hidden layer.Comment: The tensor decomposition analysis is expanded, and the analysis of
ridge regression is added for recovering the parameters of last layer of
neural networ
- …