160,878 research outputs found
Robust mixtures of regression models
Doctor of PhilosophyDepartment of StatisticsKun Chen and Weixin YaoThis proposal contains two projects that are related to robust mixture models. In the robust project,
we propose a new robust mixture of regression models (Bai et al., 2012). The existing methods for tting
mixture regression models assume a normal distribution for error and then estimate the regression param-
eters by the maximum likelihood estimate (MLE). In this project, we demonstrate that the MLE, like the
least squares estimate, is sensitive to outliers and heavy-tailed error distributions. We propose a robust
estimation procedure and an EM-type algorithm to estimate the mixture regression models. Using a Monte
Carlo simulation study, we demonstrate that the proposed new estimation method is robust and works
much better than the MLE when there are outliers or the error distribution has heavy tails. In addition, the
proposed robust method works comparably to the MLE when there are no outliers and the error is normal.
In the second project, we propose a new robust mixture of linear mixed-effects models. The traditional
mixture model with multiple linear mixed effects, assuming Gaussian distribution for random and error
parts, is sensitive to outliers. We will propose a mixture of multiple linear mixed t-distributions to robustify
the estimation procedure. An EM algorithm is provided to and the MLE under the assumption of t-
distributions for error terms and random mixed effects. Furthermore, we propose to adaptively choose the
degrees of freedom for the t-distribution using profile likelihood. In the simulation study, we demonstrate
that our proposed model works comparably to the traditional estimation method when there are no outliers
and the errors and random mixed effects are normally distributed, but works much better if there are outliers
or the distributions of the errors and random mixed effects have heavy tails
Supervised Classification Using Finite Mixture Copula
Use of copula for statistical classification is recent and gaining popularity. For example, statistical classification using copula has been proposed for automatic character recognition, medical diagnostic and most recently in data mining. Classical discrimination rules assume normality. But in this data age time, this assumption is often questionable. In fact features of data could be a mixture of discrete and continues random variables. In this paper, mixture copula densities are used to model class conditional distributions. Such types of densities are useful when the marginal densities of the vector of features are not normally distributed and are of a mixed kind of variables. Authors have shown that such mixture models are very useful for uncovering hidden structures in the data, and used them for clustering in data mining. Under such mixture models, maximum likelihood estimation methods are not suitable and regular expectation maximization algorithm is inefficient and may not converge. A new estimation method is proposed to estimate such densities and build the classifier based on mixture finite Gaussian densities. Simulations are used to compare the performance of the copula based classifier with classical normal distribution based models, logistic regression based model and independent model cases. The method is also applied to a real data
Supervised Classification Using Copula and Mixture Copula
Statistical classification is a field of study that has developed significantly after 1960\u27s. This research has a vast area of applications. For example, pattern recognition has been proposed for automatic character recognition, medical diagnostic and most recently in data mining. Classical discrimination rule assumes normality. However in many situations, this assumption is often questionable. In fact for some data, the pattern vector is a mixture of discrete and continuous random variables. In this dissertation, we use copula densities to model class conditional distributions. Such types of densities are useful when the marginal densities of a pattern vector are not normally distributed. This type of models are also useful for a mixed discrete and continuous feature types. Finite mixture density models are very flexible in building classifier and clustering, and for uncovering hidden structures in the data. We use finite mixture Gaussian copula and copula of the Archimedean family based mixture densities to build classifier. The complexities of the estimation are presented. Under such mixture models, maximum likelihood estimation methods are not suitable and regular expectation maximization algorithm may not converge, and if it does, not efficiently. We propose a new estimation method to evaluate such densities and build the classifier based on finite mixture of copula densities. We develop simulations scenarios to compare the performance of the copula based classifier with classical normal distribution based models, the logistic regression based model and the Independent model. We also apply the techniques to real data, and present the misclassification errors
Mixture model and subgroup analysis in nationwide kidney transplant center evaluation
Five year post-transplant survival rate is an important indicator on quality of care delivered by kidney transplant centers in the United States.
To provide a fair assessment of each transplant center, an effect that represents the center-specific care quality, along with patient level risk factors, is often included in the risk adjustment model.
In the past, the center effects have been modeled as either fixed effects or Gaussian random effects, with various merits and demerits.
We propose two new methods that allow flexible random effects distributions.
The first one is a Generalized Linear Mixed Model (GLMM) with normal mixture random effects.
By allowing random effects to be non homogeneous, the shrinkage effects is reduced and the predicted random effects are much closer to the truth.
In addition, modeling random effects as normal mixture will essentially clustering it into different groups, which provides a natural way of evaluating the performance in the transplant center case.
To decide the number of components, we do a sequential hypothesis tests.
In the second method, we propose a subgroup analysis on the random effects under the framework of GLMM.
Each level of the random effect is allowed to be a cluster by itself, but clusters that are close to each other will be merged into big ones.
This method provides more precise and stable estimation than fixed effects model while it has a much more flexible distributions for random effects than a GLMM with Gaussian assumption.
In addition, the other effects in the model will be selected via lasso type penalty
Inference for mixtures of symmetric distributions
This article discusses the problem of estimation of parameters in finite
mixtures when the mixture components are assumed to be symmetric and to come
from the same location family. We refer to these mixtures as semi-parametric
because no additional assumptions other than symmetry are made regarding the
parametric form of the component distributions. Because the class of symmetric
distributions is so broad, identifiability of parameters is a major issue in
these mixtures. We develop a notion of identifiability of finite mixture
models, which we call k-identifiability, where k denotes the number of
components in the mixture. We give sufficient conditions for k-identifiability
of location mixtures of symmetric components when k=2 or 3. We propose a novel
distance-based method for estimating the (location and mixing) parameters from
a k-identifiable model and establish the strong consistency and asymptotic
normality of the estimator. In the specific case of L_2-distance, we show that
our estimator generalizes the Hodges--Lehmann estimator. We discuss the
numerical implementation of these procedures, along with an empirical estimate
of the component distribution, in the two-component case. In comparisons with
maximum likelihood estimation assuming normal components, our method produces
somewhat higher standard error estimates in the case where the components are
truly normal, but dramatically outperforms the normal method when the
components are heavy-tailed.Comment: Published at http://dx.doi.org/10.1214/009053606000001118 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …