9 research outputs found

    Global Convergence of EM Algorithm for Mixtures of Two Component Linear Regression

    Full text link
    The Expectation-Maximization algorithm is perhaps the most broadly used algorithm for inference of latent variable problems. A theoretical understanding of its performance, however, largely remains lacking. Recent results established that EM enjoys global convergence for Gaussian Mixture Models. For Mixed Linear Regression, however, only local convergence results have been established, and those only for the high SNR regime. We show here that EM converges for mixed linear regression with two components (it is known that it may fail to converge for three or more), and moreover that this convergence holds for random initialization. Our analysis reveals that EM exhibits very different behavior in Mixed Linear Regression from its behavior in Gaussian Mixture Models, and hence our proofs require the development of several new ideas.Comment: To appear in the proceedings of the Conference on Learning Theory (COLT), 2019. This paper results from a merger of work from two groups who work on the problem at the same tim

    EM Converges for a Mixture of Many Linear Regressions

    Full text link
    We study the convergence of the Expectation-Maximization (EM) algorithm for mixtures of linear regressions with an arbitrary number kk of components. We show that as long as signal-to-noise ratio (SNR) is Ω~(k)\tilde{\Omega}(k), well-initialized EM converges to the true regression parameters. Previous results for k3k \geq 3 have only established local convergence for the noiseless setting, i.e., where SNR is infinitely large. Our results enlarge the scope to the environment with noises, and notably, we establish a statistical error rate that is independent of the norm (or pairwise distance) of the regression parameters. In particular, our results imply exact recovery as σ0\sigma \rightarrow 0, in contrast to most previous local convergence results for EM, where the statistical error scaled with the norm of parameters. Standard moment-method approaches may be applied to guarantee we are in the region where our local convergence guarantees apply.Comment: SNR, initialization conditions improved from previous versio

    Iterative Least Trimmed Squares for Mixed Linear Regression

    Full text link
    Given a linear regression setting, Iterative Least Trimmed Squares (ILTS) involves alternating between (a) selecting the subset of samples with lowest current loss, and (b) re-fitting the linear model only on that subset. Both steps are very fast and simple. In this paper we analyze ILTS in the setting of mixed linear regression with corruptions (MLR-C). We first establish deterministic conditions (on the features etc.) under which the ILTS iterate converges linearly to the closest mixture component. We also provide a global algorithm that uses ILTS as a subroutine, to fully solve mixed linear regressions with corruptions. We then evaluate it for the widely studied setting of isotropic Gaussian features, and establish that we match or better existing results in terms of sample complexity. Finally, we provide an ODE analysis for a gradient-descent variant of ILTS that has optimal time complexity. Our results provide initial theoretical evidence that iteratively fitting to the best subset of samples -- a potentially widely applicable idea -- can provably provide state of the art performance in bad training data settings.Comment: Accepted by NeurIPS 201

    The nonsmooth landscape of blind deconvolution

    Full text link
    The blind deconvolution problem aims to recover a rank-one matrix from a set of rank-one linear measurements. Recently, Charisopulos et al. introduced a nonconvex nonsmooth formulation that can be used, in combination with an initialization procedure, to provably solve this problem under standard statistical assumptions. In practice, however, initialization is unnecessary. As we demonstrate numerically, a randomly initialized subgradient method consistently solves the problem. In pursuit of a better understanding of this phenomenon, we study the random landscape of this formulation. We characterize in closed form the landscape of the population objective and describe the approximate location of the stationary points of the sample objective. In particular, we show that the set of spurious critical points lies close to a codimension two subspace. In doing this, we develop tools for studying the landscape of a broader family of singular value functions, these results may be of independent interest.Comment: 25 pages, 2 figure

    Recovery of Sparse Signals from a Mixture of Linear Samples

    Full text link
    Mixture of linear regressions is a popular learning theoretic model that is used widely to represent heterogeneous data. In the simplest form, this model assumes that the labels are generated from either of two different linear models and mixed together. Recent works of Yin et al. and Krishnamurthy et al., 2019, focus on an experimental design setting of model recovery for this problem. It is assumed that the features can be designed and queried with to obtain their label. When queried, an oracle randomly selects one of the two different sparse linear models and generates a label accordingly. How many such oracle queries are needed to recover both of the models simultaneously? This question can also be thought of as a generalization of the well-known compressed sensing problem (Cand\`es and Tao, 2005, Donoho, 2006). In this work, we address this query complexity problem and provide efficient algorithms that improves on the previously best known results.Comment: International Conference on Machine Learning (ICML), 2020. (26 pages, 3 figures

    Sample Complexity of Learning Mixtures of Sparse Linear Regressions

    Full text link
    In the problem of learning mixtures of linear regressions, the goal is to learn a collection of signal vectors from a sequence of (possibly noisy) linear measurements, where each measurement is evaluated on an unknown signal drawn uniformly from this collection. This setting is quite expressive and has been studied both in terms of practical applications and for the sake of establishing theoretical guarantees. In this paper, we consider the case where the signal vectors are sparse; this generalizes the popular compressed sensing paradigm. We improve upon the state-of-the-art results as follows: In the noisy case, we resolve an open question of Yin et al. (IEEE Transactions on Information Theory, 2019) by showing how to handle collections of more than two vectors and present the first robust reconstruction algorithm, i.e., if the signals are not perfectly sparse, we still learn a good sparse approximation of the signals. In the noiseless case, as well as in the noisy case, we show how to circumvent the need for a restrictive assumption required in the previous work. Our techniques are quite different from those in the previous work: for the noiseless case, we rely on a property of sparse polynomials and for the noisy case, we provide new connections to learning Gaussian mixtures and use ideas from the theory of error-correcting codes.Comment: NeurIPS 201

    Global Convergence of Least Squares EM for Demixing Two Log-Concave Densities

    Full text link
    This work studies the location estimation problem for a mixture of two rotation invariant log-concave densities. We demonstrate that Least Squares EM, a variant of the EM algorithm, converges to the true location parameter from a randomly initialized point. We establish the explicit convergence rates and sample complexity bounds, revealing their dependence on the signal-to-noise ratio and the tail property of the log-concave distribution. Moreover, we show that this global convergence property is robust under model mis-specification. Our analysis generalizes previous techniques for proving the convergence results for Gaussian mixtures. In particular, we make use of an angle-decreasing property for establishing global convergence of Least Squares EM beyond Gaussian settings, as 2\ell_2 distance contraction no longer holds globally for general log-concave mixtures

    Randomly initialized EM algorithm for two-component Gaussian mixture achieves near optimality in O(n)O(\sqrt{n}) iterations

    Full text link
    We analyze the classical EM algorithm for parameter estimation in the symmetric two-component Gaussian mixtures in dd dimensions. We show that, even in the absence of any separation between components, provided that the sample size satisfies n=Ω(dlog3d)n=\Omega(d \log^3 d), the randomly initialized EM algorithm converges to an estimate in at most O(n)O(\sqrt{n}) iterations with high probability, which is at most O((dlog3nn)1/4)O((\frac{d \log^3 n}{n})^{1/4}) in Euclidean distance from the true parameter and within logarithmic factors of the minimax rate of (dn)1/4(\frac{d}{n})^{1/4}. Both the nonparametric statistical rate and the sublinear convergence rate are direct consequences of the zero Fisher information in the worst case. Refined pointwise guarantees beyond worst-case analysis and convergence to the MLE are also shown under mild conditions. This improves the previous result of Balakrishnan et al \cite{BWY17} which requires strong conditions on both the separation of the components and the quality of the initialization, and that of Daskalakis et al \cite{DTZ17} which requires sample splitting and restarting the EM iteration

    Learning Mixtures of Linear Regressions in Subexponential Time via Fourier Moments

    Full text link
    We consider the problem of learning a mixture of linear regressions (MLRs). An MLR is specified by kk nonnegative mixing weights p1,,pkp_1, \ldots, p_k summing to 11, and kk unknown regressors w1,...,wkRdw_1,...,w_k\in\mathbb{R}^d. A sample from the MLR is drawn by sampling ii with probability pip_i, then outputting (x,y)(x, y) where y=x,wi+ηy = \langle x, w_i \rangle + \eta, where ηN(0,ς2)\eta\sim\mathcal{N}(0,\varsigma^2) for noise rate ς\varsigma. Mixtures of linear regressions are a popular generative model and have been studied extensively in machine learning and theoretical computer science. However, all previous algorithms for learning the parameters of an MLR require running time and sample complexity scaling exponentially with kk. In this paper, we give the first algorithm for learning an MLR that runs in time which is sub-exponential in kk. Specifically, we give an algorithm which runs in time O~(d)exp(O~(k))\widetilde{O}(d)\cdot\exp(\widetilde{O}(\sqrt{k})) and outputs the parameters of the MLR to high accuracy, even in the presence of nontrivial regression noise. We demonstrate a new method that we call "Fourier moment descent" which uses univariate density estimation and low-degree moments of the Fourier transform of suitable univariate projections of the MLR to iteratively refine our estimate of the parameters. To the best of our knowledge, these techniques have never been used in the context of high dimensional distribution learning, and may be of independent interest. We also show that our techniques can be used to give a sub-exponential time algorithm for learning mixtures of hyperplanes, a natural hard instance of the subspace clustering problem.Comment: 83 pages, 1 figur
    corecore