9 research outputs found
Global Convergence of EM Algorithm for Mixtures of Two Component Linear Regression
The Expectation-Maximization algorithm is perhaps the most broadly used
algorithm for inference of latent variable problems. A theoretical
understanding of its performance, however, largely remains lacking. Recent
results established that EM enjoys global convergence for Gaussian Mixture
Models. For Mixed Linear Regression, however, only local convergence results
have been established, and those only for the high SNR regime. We show here
that EM converges for mixed linear regression with two components (it is known
that it may fail to converge for three or more), and moreover that this
convergence holds for random initialization. Our analysis reveals that EM
exhibits very different behavior in Mixed Linear Regression from its behavior
in Gaussian Mixture Models, and hence our proofs require the development of
several new ideas.Comment: To appear in the proceedings of the Conference on Learning Theory
(COLT), 2019. This paper results from a merger of work from two groups who
work on the problem at the same tim
EM Converges for a Mixture of Many Linear Regressions
We study the convergence of the Expectation-Maximization (EM) algorithm for
mixtures of linear regressions with an arbitrary number of components. We
show that as long as signal-to-noise ratio (SNR) is ,
well-initialized EM converges to the true regression parameters. Previous
results for have only established local convergence for the
noiseless setting, i.e., where SNR is infinitely large. Our results enlarge the
scope to the environment with noises, and notably, we establish a statistical
error rate that is independent of the norm (or pairwise distance) of the
regression parameters. In particular, our results imply exact recovery as
, in contrast to most previous local convergence results
for EM, where the statistical error scaled with the norm of parameters.
Standard moment-method approaches may be applied to guarantee we are in the
region where our local convergence guarantees apply.Comment: SNR, initialization conditions improved from previous versio
Iterative Least Trimmed Squares for Mixed Linear Regression
Given a linear regression setting, Iterative Least Trimmed Squares (ILTS)
involves alternating between (a) selecting the subset of samples with lowest
current loss, and (b) re-fitting the linear model only on that subset. Both
steps are very fast and simple. In this paper we analyze ILTS in the setting of
mixed linear regression with corruptions (MLR-C). We first establish
deterministic conditions (on the features etc.) under which the ILTS iterate
converges linearly to the closest mixture component. We also provide a global
algorithm that uses ILTS as a subroutine, to fully solve mixed linear
regressions with corruptions. We then evaluate it for the widely studied
setting of isotropic Gaussian features, and establish that we match or better
existing results in terms of sample complexity. Finally, we provide an ODE
analysis for a gradient-descent variant of ILTS that has optimal time
complexity.
Our results provide initial theoretical evidence that iteratively fitting to
the best subset of samples -- a potentially widely applicable idea -- can
provably provide state of the art performance in bad training data settings.Comment: Accepted by NeurIPS 201
The nonsmooth landscape of blind deconvolution
The blind deconvolution problem aims to recover a rank-one matrix from a set
of rank-one linear measurements. Recently, Charisopulos et al. introduced a
nonconvex nonsmooth formulation that can be used, in combination with an
initialization procedure, to provably solve this problem under standard
statistical assumptions. In practice, however, initialization is unnecessary.
As we demonstrate numerically, a randomly initialized subgradient method
consistently solves the problem. In pursuit of a better understanding of this
phenomenon, we study the random landscape of this formulation. We characterize
in closed form the landscape of the population objective and describe the
approximate location of the stationary points of the sample objective. In
particular, we show that the set of spurious critical points lies close to a
codimension two subspace. In doing this, we develop tools for studying the
landscape of a broader family of singular value functions, these results may be
of independent interest.Comment: 25 pages, 2 figure
Recovery of Sparse Signals from a Mixture of Linear Samples
Mixture of linear regressions is a popular learning theoretic model that is
used widely to represent heterogeneous data. In the simplest form, this model
assumes that the labels are generated from either of two different linear
models and mixed together. Recent works of Yin et al. and Krishnamurthy et al.,
2019, focus on an experimental design setting of model recovery for this
problem. It is assumed that the features can be designed and queried with to
obtain their label. When queried, an oracle randomly selects one of the two
different sparse linear models and generates a label accordingly. How many such
oracle queries are needed to recover both of the models simultaneously? This
question can also be thought of as a generalization of the well-known
compressed sensing problem (Cand\`es and Tao, 2005, Donoho, 2006). In this
work, we address this query complexity problem and provide efficient algorithms
that improves on the previously best known results.Comment: International Conference on Machine Learning (ICML), 2020. (26 pages,
3 figures
Sample Complexity of Learning Mixtures of Sparse Linear Regressions
In the problem of learning mixtures of linear regressions, the goal is to
learn a collection of signal vectors from a sequence of (possibly noisy) linear
measurements, where each measurement is evaluated on an unknown signal drawn
uniformly from this collection. This setting is quite expressive and has been
studied both in terms of practical applications and for the sake of
establishing theoretical guarantees. In this paper, we consider the case where
the signal vectors are sparse; this generalizes the popular compressed sensing
paradigm. We improve upon the state-of-the-art results as follows: In the noisy
case, we resolve an open question of Yin et al. (IEEE Transactions on
Information Theory, 2019) by showing how to handle collections of more than two
vectors and present the first robust reconstruction algorithm, i.e., if the
signals are not perfectly sparse, we still learn a good sparse approximation of
the signals. In the noiseless case, as well as in the noisy case, we show how
to circumvent the need for a restrictive assumption required in the previous
work. Our techniques are quite different from those in the previous work: for
the noiseless case, we rely on a property of sparse polynomials and for the
noisy case, we provide new connections to learning Gaussian mixtures and use
ideas from the theory of error-correcting codes.Comment: NeurIPS 201
Global Convergence of Least Squares EM for Demixing Two Log-Concave Densities
This work studies the location estimation problem for a mixture of two
rotation invariant log-concave densities. We demonstrate that Least Squares EM,
a variant of the EM algorithm, converges to the true location parameter from a
randomly initialized point. We establish the explicit convergence rates and
sample complexity bounds, revealing their dependence on the signal-to-noise
ratio and the tail property of the log-concave distribution. Moreover, we show
that this global convergence property is robust under model mis-specification.
Our analysis generalizes previous techniques for proving the convergence
results for Gaussian mixtures. In particular, we make use of an
angle-decreasing property for establishing global convergence of Least Squares
EM beyond Gaussian settings, as distance contraction no longer holds
globally for general log-concave mixtures
Randomly initialized EM algorithm for two-component Gaussian mixture achieves near optimality in iterations
We analyze the classical EM algorithm for parameter estimation in the
symmetric two-component Gaussian mixtures in dimensions. We show that, even
in the absence of any separation between components, provided that the sample
size satisfies , the randomly initialized EM algorithm
converges to an estimate in at most iterations with high
probability, which is at most in Euclidean
distance from the true parameter and within logarithmic factors of the minimax
rate of . Both the nonparametric statistical rate and the
sublinear convergence rate are direct consequences of the zero Fisher
information in the worst case. Refined pointwise guarantees beyond worst-case
analysis and convergence to the MLE are also shown under mild conditions.
This improves the previous result of Balakrishnan et al \cite{BWY17} which
requires strong conditions on both the separation of the components and the
quality of the initialization, and that of Daskalakis et al \cite{DTZ17} which
requires sample splitting and restarting the EM iteration
Learning Mixtures of Linear Regressions in Subexponential Time via Fourier Moments
We consider the problem of learning a mixture of linear regressions (MLRs).
An MLR is specified by nonnegative mixing weights
summing to , and unknown regressors . A
sample from the MLR is drawn by sampling with probability , then
outputting where , where
for noise rate . Mixtures of
linear regressions are a popular generative model and have been studied
extensively in machine learning and theoretical computer science. However, all
previous algorithms for learning the parameters of an MLR require running time
and sample complexity scaling exponentially with .
In this paper, we give the first algorithm for learning an MLR that runs in
time which is sub-exponential in . Specifically, we give an algorithm which
runs in time and outputs
the parameters of the MLR to high accuracy, even in the presence of nontrivial
regression noise. We demonstrate a new method that we call "Fourier moment
descent" which uses univariate density estimation and low-degree moments of the
Fourier transform of suitable univariate projections of the MLR to iteratively
refine our estimate of the parameters. To the best of our knowledge, these
techniques have never been used in the context of high dimensional distribution
learning, and may be of independent interest. We also show that our techniques
can be used to give a sub-exponential time algorithm for learning mixtures of
hyperplanes, a natural hard instance of the subspace clustering problem.Comment: 83 pages, 1 figur