160,878 research outputs found

    Robust mixtures of regression models

    Get PDF
    Doctor of PhilosophyDepartment of StatisticsKun Chen and Weixin YaoThis proposal contains two projects that are related to robust mixture models. In the robust project, we propose a new robust mixture of regression models (Bai et al., 2012). The existing methods for tting mixture regression models assume a normal distribution for error and then estimate the regression param- eters by the maximum likelihood estimate (MLE). In this project, we demonstrate that the MLE, like the least squares estimate, is sensitive to outliers and heavy-tailed error distributions. We propose a robust estimation procedure and an EM-type algorithm to estimate the mixture regression models. Using a Monte Carlo simulation study, we demonstrate that the proposed new estimation method is robust and works much better than the MLE when there are outliers or the error distribution has heavy tails. In addition, the proposed robust method works comparably to the MLE when there are no outliers and the error is normal. In the second project, we propose a new robust mixture of linear mixed-effects models. The traditional mixture model with multiple linear mixed effects, assuming Gaussian distribution for random and error parts, is sensitive to outliers. We will propose a mixture of multiple linear mixed t-distributions to robustify the estimation procedure. An EM algorithm is provided to and the MLE under the assumption of t- distributions for error terms and random mixed effects. Furthermore, we propose to adaptively choose the degrees of freedom for the t-distribution using profile likelihood. In the simulation study, we demonstrate that our proposed model works comparably to the traditional estimation method when there are no outliers and the errors and random mixed effects are normally distributed, but works much better if there are outliers or the distributions of the errors and random mixed effects have heavy tails

    Supervised Classification Using Finite Mixture Copula

    Get PDF
    Use of copula for statistical classification is recent and gaining popularity. For example, statistical classification using copula has been proposed for automatic character recognition, medical diagnostic and most recently in data mining. Classical discrimination rules assume normality. But in this data age time, this assumption is often questionable. In fact features of data could be a mixture of discrete and continues random variables. In this paper, mixture copula densities are used to model class conditional distributions. Such types of densities are useful when the marginal densities of the vector of features are not normally distributed and are of a mixed kind of variables. Authors have shown that such mixture models are very useful for uncovering hidden structures in the data, and used them for clustering in data mining. Under such mixture models, maximum likelihood estimation methods are not suitable and regular expectation maximization algorithm is inefficient and may not converge. A new estimation method is proposed to estimate such densities and build the classifier based on mixture finite Gaussian densities. Simulations are used to compare the performance of the copula based classifier with classical normal distribution based models, logistic regression based model and independent model cases. The method is also applied to a real data

    Supervised Classification Using Copula and Mixture Copula

    Get PDF
    Statistical classification is a field of study that has developed significantly after 1960\u27s. This research has a vast area of applications. For example, pattern recognition has been proposed for automatic character recognition, medical diagnostic and most recently in data mining. Classical discrimination rule assumes normality. However in many situations, this assumption is often questionable. In fact for some data, the pattern vector is a mixture of discrete and continuous random variables. In this dissertation, we use copula densities to model class conditional distributions. Such types of densities are useful when the marginal densities of a pattern vector are not normally distributed. This type of models are also useful for a mixed discrete and continuous feature types. Finite mixture density models are very flexible in building classifier and clustering, and for uncovering hidden structures in the data. We use finite mixture Gaussian copula and copula of the Archimedean family based mixture densities to build classifier. The complexities of the estimation are presented. Under such mixture models, maximum likelihood estimation methods are not suitable and regular expectation maximization algorithm may not converge, and if it does, not efficiently. We propose a new estimation method to evaluate such densities and build the classifier based on finite mixture of copula densities. We develop simulations scenarios to compare the performance of the copula based classifier with classical normal distribution based models, the logistic regression based model and the Independent model. We also apply the techniques to real data, and present the misclassification errors

    Mixture model and subgroup analysis in nationwide kidney transplant center evaluation

    Get PDF
    Five year post-transplant survival rate is an important indicator on quality of care delivered by kidney transplant centers in the United States. To provide a fair assessment of each transplant center, an effect that represents the center-specific care quality, along with patient level risk factors, is often included in the risk adjustment model. In the past, the center effects have been modeled as either fixed effects or Gaussian random effects, with various merits and demerits. We propose two new methods that allow flexible random effects distributions. The first one is a Generalized Linear Mixed Model (GLMM) with normal mixture random effects. By allowing random effects to be non homogeneous, the shrinkage effects is reduced and the predicted random effects are much closer to the truth. In addition, modeling random effects as normal mixture will essentially clustering it into different groups, which provides a natural way of evaluating the performance in the transplant center case. To decide the number of components, we do a sequential hypothesis tests. In the second method, we propose a subgroup analysis on the random effects under the framework of GLMM. Each level of the random effect is allowed to be a cluster by itself, but clusters that are close to each other will be merged into big ones. This method provides more precise and stable estimation than fixed effects model while it has a much more flexible distributions for random effects than a GLMM with Gaussian assumption. In addition, the other effects in the model will be selected via lasso type penalty

    Inference for mixtures of symmetric distributions

    Full text link
    This article discusses the problem of estimation of parameters in finite mixtures when the mixture components are assumed to be symmetric and to come from the same location family. We refer to these mixtures as semi-parametric because no additional assumptions other than symmetry are made regarding the parametric form of the component distributions. Because the class of symmetric distributions is so broad, identifiability of parameters is a major issue in these mixtures. We develop a notion of identifiability of finite mixture models, which we call k-identifiability, where k denotes the number of components in the mixture. We give sufficient conditions for k-identifiability of location mixtures of symmetric components when k=2 or 3. We propose a novel distance-based method for estimating the (location and mixing) parameters from a k-identifiable model and establish the strong consistency and asymptotic normality of the estimator. In the specific case of L_2-distance, we show that our estimator generalizes the Hodges--Lehmann estimator. We discuss the numerical implementation of these procedures, along with an empirical estimate of the component distribution, in the two-component case. In comparisons with maximum likelihood estimation assuming normal components, our method produces somewhat higher standard error estimates in the case where the components are truly normal, but dramatically outperforms the normal method when the components are heavy-tailed.Comment: Published at http://dx.doi.org/10.1214/009053606000001118 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore