4,527 research outputs found
Learning Mixtures of Gaussians in High Dimensions
Efficiently learning mixture of Gaussians is a fundamental problem in
statistics and learning theory. Given samples coming from a random one out of k
Gaussian distributions in Rn, the learning problem asks to estimate the means
and the covariance matrices of these Gaussians. This learning problem arises in
many areas ranging from the natural sciences to the social sciences, and has
also found many machine learning applications. Unfortunately, learning mixture
of Gaussians is an information theoretically hard problem: in order to learn
the parameters up to a reasonable accuracy, the number of samples required is
exponential in the number of Gaussian components in the worst case. In this
work, we show that provided we are in high enough dimensions, the class of
Gaussian mixtures is learnable in its most general form under a smoothed
analysis framework, where the parameters are randomly perturbed from an
adversarial starting point. In particular, given samples from a mixture of
Gaussians with randomly perturbed parameters, when n > {\Omega}(k^2), we give
an algorithm that learns the parameters with polynomial running time and using
polynomial number of samples. The central algorithmic ideas consist of new ways
to decompose the moment tensor of the Gaussian mixture by exploiting its
structural properties. The symmetries of this tensor are derived from the
combinatorial structure of higher order moments of Gaussian distributions
(sometimes referred to as Isserlis' theorem or Wick's theorem). We also develop
new tools for bounding smallest singular values of structured random matrices,
which could be useful in other smoothed analysis settings
Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation
While several papers have investigated computationally and statistically
efficient methods for learning Gaussian mixtures, precise minimax bounds for
their statistical performance as well as fundamental limits in high-dimensional
settings are not well-understood. In this paper, we provide precise information
theoretic bounds on the clustering accuracy and sample complexity of learning a
mixture of two isotropic Gaussians in high dimensions under small mean
separation. If there is a sparse subset of relevant dimensions that determine
the mean separation, then the sample complexity only depends on the number of
relevant dimensions and mean separation, and can be achieved by a simple
computationally efficient procedure. Our results provide the first step of a
theoretical basis for recent methods that combine feature selection and
clustering
On Learning Mixtures of Well-Separated Gaussians
We consider the problem of efficiently learning mixtures of a large number of
spherical Gaussians, when the components of the mixture are well separated. In
the most basic form of this problem, we are given samples from a uniform
mixture of standard spherical Gaussians, and the goal is to estimate the
means up to accuracy using samples.
In this work, we study the following question: what is the minimum separation
needed between the means for solving this task? The best known algorithm due to
Vempala and Wang [JCSS 2004] requires a separation of roughly
. On the other hand, Moitra and Valiant [FOCS 2010] showed
that with separation , exponentially many samples are required. We
address the significant gap between these two bounds, by showing the following
results.
1. We show that with separation , super-polynomially many
samples are required. In fact, this holds even when the means of the
Gaussians are picked at random in dimensions.
2. We show that with separation ,
samples suffice. Note that the bound on the separation is independent of
. This result is based on a new and efficient "accuracy boosting"
algorithm that takes as input coarse estimates of the true means and in time
outputs estimates of the means up to arbitrary accuracy
assuming the separation between the means is (independently of ).
We also present a computationally efficient algorithm in dimensions
with only separation. These results together essentially
characterize the optimal order of separation between components that is needed
to learn a mixture of spherical Gaussians with polynomial samples.Comment: Appeared in FOCS 2017. 55 pages, 1 figur
Smoothed Analysis of Tensor Decompositions
Low rank tensor decompositions are a powerful tool for learning generative
models, and uniqueness results give them a significant advantage over matrix
decomposition methods. However, tensors pose significant algorithmic challenges
and tensors analogs of much of the matrix algebra toolkit are unlikely to exist
because of hardness results. Efficient decomposition in the overcomplete case
(where rank exceeds dimension) is particularly challenging. We introduce a
smoothed analysis model for studying these questions and develop an efficient
algorithm for tensor decomposition in the highly overcomplete case (rank
polynomial in the dimension). In this setting, we show that our algorithm is
robust to inverse polynomial error -- a crucial property for applications in
learning since we are only allowed a polynomial number of samples. While
algorithms are known for exact tensor decomposition in some overcomplete
settings, our main contribution is in analyzing their stability in the
framework of smoothed analysis.
Our main technical contribution is to show that tensor products of perturbed
vectors are linearly independent in a robust sense (i.e. the associated matrix
has singular values that are at least an inverse polynomial). This key result
paves the way for applying tensor methods to learning problems in the smoothed
setting. In particular, we use it to obtain results for learning multi-view
models and mixtures of axis-aligned Gaussians where there are many more
"components" than dimensions. The assumption here is that the model is not
adversarially chosen, formalized by a perturbation of model parameters. We
believe this an appealing way to analyze realistic instances of learning
problems, since this framework allows us to overcome many of the usual
limitations of using tensor methods.Comment: 32 pages (including appendix
- …