Search CORE

89,716 research outputs found

Smoothed Analysis in Unsupervised Learning via Decoupling

Author: Bhaskara Aditya
Chen Aidao
Perreault Aidan
Vijayaraghavan Aravindan
Publication venue
Publication date: 23/04/2019
Field of study

Smoothed analysis is a powerful paradigm in overcoming worst-case intractability in unsupervised learning and high-dimensional data analysis. While polynomial time smoothed analysis guarantees have been obtained for worst-case intractable problems like tensor decompositions and learning mixtures of Gaussians, such guarantees have been hard to obtain for several other important problems in unsupervised learning. A core technical challenge in analyzing algorithms is obtaining lower bounds on the least singular value for random matrix ensembles with dependent entries, that are given by low-degree polynomials of a few base underlying random variables. In this work, we address this challenge by obtaining high-confidence lower bounds on the least singular value of new classes of structured random matrix ensembles of the above kind. We then use these bounds to design algorithms with polynomial time smoothed analysis guarantees for the following three important problems in unsupervised learning: 1. Robust subspace recovery, when the fraction

\alpha

of inliers in the d-dimensional subspace

T \subset \mathbb{R}^n

is at least

\alpha > (d/n)^\ell

for any constant integer

\ell>0

. This contrasts with the known worst-case intractability when

\alpha< d/n

, and the previous smoothed analysis result which needed

\alpha > d/n

(Hardt and Moitra, 2013). 2. Learning overcomplete hidden markov models, where the size of the state space is any polynomial in the dimension of the observations. This gives the first polynomial time guarantees for learning overcomplete HMMs in a smoothed analysis model. 3. Higher order tensor decompositions, where we generalize the so-called FOOBI algorithm of Cardoso to find order-

\ell

rank-one tensors in a subspace. This allows us to obtain polynomially robust decomposition algorithms for

2\ell

'th order tensors with rank

O(n^{\ell})

.Comment: 44 page

arXiv.org e-Print Archive

Crossref

Smoothed Analysis of Tensor Decompositions

Author: Anandkumar A.
Anandkumar A.
Anandkumar A.
Arora S.
Belkin M.
Dasgupta S.
Goyal N.
Harshman R.
Horn R.
Hyvärinen A.
Lindsay B.
Stegeman A.
Wedin P.
Publication venue
Publication date: 01/01/2014
Field of study

Low rank tensor decompositions are a powerful tool for learning generative models, and uniqueness results give them a significant advantage over matrix decomposition methods. However, tensors pose significant algorithmic challenges and tensors analogs of much of the matrix algebra toolkit are unlikely to exist because of hardness results. Efficient decomposition in the overcomplete case (where rank exceeds dimension) is particularly challenging. We introduce a smoothed analysis model for studying these questions and develop an efficient algorithm for tensor decomposition in the highly overcomplete case (rank polynomial in the dimension). In this setting, we show that our algorithm is robust to inverse polynomial error -- a crucial property for applications in learning since we are only allowed a polynomial number of samples. While algorithms are known for exact tensor decomposition in some overcomplete settings, our main contribution is in analyzing their stability in the framework of smoothed analysis. Our main technical contribution is to show that tensor products of perturbed vectors are linearly independent in a robust sense (i.e. the associated matrix has singular values that are at least an inverse polynomial). This key result paves the way for applying tensor methods to learning problems in the smoothed setting. In particular, we use it to obtain results for learning multi-view models and mixtures of axis-aligned Gaussians where there are many more "components" than dimensions. The assumption here is that the model is not adversarially chosen, formalized by a perturbation of model parameters. We believe this an appealing way to analyze realistic instances of learning problems, since this framework allows us to overcome many of the usual limitations of using tensor methods.Comment: 32 pages (including appendix

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Crossref

Learning DNF Expressions from Fourier Spectrum

Author: Feldman Vitaly
Publication venue
Publication date: 01/01/2012
Field of study

Since its introduction by Valiant in 1984, PAC learning of DNF expressions remains one of the central problems in learning theory. We consider this problem in the setting where the underlying distribution is uniform, or more generally, a product distribution. Kalai, Samorodnitsky and Teng (2009) showed that in this setting a DNF expression can be efficiently approximated from its "heavy" low-degree Fourier coefficients alone. This is in contrast to previous approaches where boosting was used and thus Fourier coefficients of the target function modified by various distributions were needed. This property is crucial for learning of DNF expressions over smoothed product distributions, a learning model introduced by Kalai et al. (2009) and inspired by the seminal smoothed analysis model of Spielman and Teng (2001). We introduce a new approach to learning (or approximating) a polynomial threshold functions which is based on creating a function with range [-1,1] that approximately agrees with the unknown function on low-degree Fourier coefficients. We then describe conditions under which this is sufficient for learning polynomial threshold functions. Our approach yields a new, simple algorithm for approximating any polynomial-size DNF expression from its "heavy" low-degree Fourier coefficients alone. Our algorithm greatly simplifies the proof of learnability of DNF expressions over smoothed product distributions. We also describe an application of our algorithm to learning monotone DNF expressions over product distributions. Building on the work of Servedio (2001), we give an algorithm that runs in time \poly((s \cdot \log{(s/\eps)})^{\log{(s/\eps)}}, n), where

s

is the size of the target DNF expression and \eps is the accuracy. This improves on \poly((s \cdot \log{(ns/\eps)})^{\log{(s/\eps)} \cdot \log{(1/\eps)}}, n) bound of Servedio (2001).Comment: Appears in Conference on Learning Theory (COLT) 201

arXiv.org e-Print Archive

CiteSeerX

Learning Mixtures of Gaussians in High Dimensions

Author: Hardt Moritz
Hardt Moritz
Jain Prateek
McLachlan Geoffrey
Permuter H
Reynolds Douglas A
Titterington D Michael
Valiant Gregory John
Zoran Daniel
Publication venue
Publication date: 09/03/2015
Field of study

Efficiently learning mixture of Gaussians is a fundamental problem in statistics and learning theory. Given samples coming from a random one out of k Gaussian distributions in Rn, the learning problem asks to estimate the means and the covariance matrices of these Gaussians. This learning problem arises in many areas ranging from the natural sciences to the social sciences, and has also found many machine learning applications. Unfortunately, learning mixture of Gaussians is an information theoretically hard problem: in order to learn the parameters up to a reasonable accuracy, the number of samples required is exponential in the number of Gaussian components in the worst case. In this work, we show that provided we are in high enough dimensions, the class of Gaussian mixtures is learnable in its most general form under a smoothed analysis framework, where the parameters are randomly perturbed from an adversarial starting point. In particular, given samples from a mixture of Gaussians with randomly perturbed parameters, when n > {\Omega}(k^2), we give an algorithm that learns the parameters with polynomial running time and using polynomial number of samples. The central algorithmic ideas consist of new ways to decompose the moment tensor of the Gaussian mixture by exploiting its structural properties. The symmetries of this tensor are derived from the combinatorial structure of higher order moments of Gaussian distributions (sometimes referred to as Isserlis' theorem or Wick's theorem). We also develop new tools for bounding smallest singular values of structured random matrices, which could be useful in other smoothed analysis settings

arXiv.org e-Print Archive

Crossref

Polynomial-time Tensor Decompositions with Sum-of-Squares

Author: Ma Tengyu
Shi Jonathan
Steurer David
Publication venue
Publication date: 06/10/2016
Field of study

We give new algorithms based on the sum-of-squares method for tensor decomposition. Our results improve the best known running times from quasi-polynomial to polynomial for several problems, including decomposing random overcomplete 3-tensors and learning overcomplete dictionaries with constant relative sparsity. We also give the first robust analysis for decomposing overcomplete 4-tensors in the smoothed analysis model. A key ingredient of our analysis is to establish small spectral gaps in moment matrices derived from solutions to sum-of-squares relaxations. To enable this analysis we augment sum-of-squares relaxations with spectral analogs of maximum entropy constraints.Comment: to appear in FOCS 201

arXiv.org e-Print Archive

Crossref

Efficiently Learning Neural Networks: What Assumptions May Suffice?

Author: Daniely Amit
Srebro Nathan
Vardi Gal
Publication venue
Publication date: 14/02/2023
Field of study

Understanding when neural networks can be learned efficiently is a fundamental question in learning theory. Existing hardness results suggest that assumptions on both the input distribution and the network's weights are necessary for obtaining efficient algorithms. Moreover, it was previously shown that depth-

2

networks can be efficiently learned under the assumptions that the input distribution is Gaussian, and the weight matrix is non-degenerate. In this work, we study whether such assumptions may suffice for learning deeper networks and prove negative results. We show that learning depth-

3

ReLU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework, where a random noise is added to the network's parameters. It implies that learning depth-

3

ReLU networks under the Gaussian distribution is hard even if the weight matrices are non-degenerate. Moreover, we consider depth-

2

networks, and show hardness of learning in the smoothed-analysis framework, where both the network parameters and the input distribution are smoothed. Our hardness results are under a well-studied assumption on the existence of local pseudorandom generators.Comment: arXiv admin note: text overlap with arXiv:2101.0830

arXiv.org e-Print Archive