9,417 research outputs found
Modelling Background Noise in Finite Mixtures of Generalized Linear Regression Models
In this paper we show how only a few outliers can completely break down EM-estimation of mixtures of regression models. A simple, yet very effective way of dealing with this problem, is to use a component where all regression parameters are fixed to zero to model the background noise. This noise component can be easily defined for different types of generalized linear models, has a familiar interpretation as the empty regression model, and is not very sensitive with respect to its own parameters
Statistical guarantees for the EM algorithm: From population to sample-based analysis
We develop a general framework for proving rigorous guarantees on the
performance of the EM algorithm and a variant known as gradient EM. Our
analysis is divided into two parts: a treatment of these algorithms at the
population level (in the limit of infinite data), followed by results that
apply to updates based on a finite set of samples. First, we characterize the
domain of attraction of any global maximizer of the population likelihood. This
characterization is based on a novel view of the EM updates as a perturbed form
of likelihood ascent, or in parallel, of the gradient EM updates as a perturbed
form of standard gradient ascent. Leveraging this characterization, we then
provide non-asymptotic guarantees on the EM and gradient EM algorithms when
applied to a finite set of samples. We develop consequences of our general
theory for three canonical examples of incomplete-data problems: mixture of
Gaussians, mixture of regressions, and linear regression with covariates
missing completely at random. In each case, our theory guarantees that with a
suitable initialization, a relatively small number of EM (or gradient EM) steps
will yield (with high probability) an estimate that is within statistical error
of the MLE. We provide simulations to confirm this theoretically predicted
behavior
Mixtures of Regression Models for Time-Course Gene Expression Data: Evaluation of Initialization and Random Effects
Finite mixture models are routinely applied to time course microarray data.
Due to the complexity and size of this type of data the choice of good starting values plays
an important role. So far initialization strategies have only been investigated for data
from a mixture of multivariate normal distributions. In this work several initialization
procedures are evaluated for mixtures of regression models with and without random
effects in an extensive simulation study on different artificial datasets. Finally these
procedures are also applied to a real dataset from E. coli
Mixture of Regression Models with Single-Index
In this article, we propose a class of semiparametric mixture regression
models with single-index. We argue that many recently proposed
semiparametric/nonparametric mixture regression models can be considered
special cases of the proposed model. However, unlike existing semiparametric
mixture regression models, the new pro- posed model can easily incorporate
multivariate predictors into the nonparametric components. Backfitting
estimates and the corresponding algorithms have been proposed for to achieve
the optimal convergence rate for both the parameters and the nonparametric
functions. We show that nonparametric functions can be esti- mated with the
same asymptotic accuracy as if the parameters were known and the index
parameters can be estimated with the traditional parametric root n convergence
rate. Simulation studies and an application of NBA data have been conducted to
demonstrate the finite sample performance of the proposed models.Comment: 28 pages, 2 figure
- …