147,946 research outputs found

    Mixtures of Common Skew-t Factor Analyzers

    Full text link
    A mixture of common skew-t factor analyzers model is introduced for model-based clustering of high-dimensional data. By assuming common component factor loadings, this model allows clustering to be performed in the presence of a large number of mixture components or when the number of dimensions is too large to be well-modelled by the mixtures of factor analyzers model or a variant thereof. Furthermore, assuming that the component densities follow a skew-t distribution allows robust clustering of skewed data. The alternating expectation-conditional maximization algorithm is employed for parameter estimation. We demonstrate excellent clustering performance when our model is applied to real and simulated data.This paper marks the first time that skewed common factors have been used

    Using Finite Mixtures to Robustify Statistical Models

    Get PDF
    Abstract This thesis is concerned with robust estimation of the parameters of statistical models. Although robust estimation is a very good idea, it has some shortcomings when seen from the statistical modelling point of view. For example, there are no easily applicable principles for creating robust estimates in new situations. In this thesis, we are trying to introduce a unified method to obtain the robustified statistics for various situations such as the linear model and the generalized linear model. We wish to modify the maximum likelihood estimation procedure, which is very sensitive to the outliers. In order to reduce the effect on these estimates by outliers, we add an additional component, which would be of no interest but would contain all outliers, to the regular component forming a finite mixture. In fact, we use the finite mixture model to obtain a ``robustified'' estimate for a model parameter θ\theta, where the finite mixture form is being used as a mathematical tool to have a tractable form of analysis rather than being used as a serious model for the data. We employ the EM algorithm to obtain the our proposed robustified estimates for the parameters. Our estimates are compared with some other estimates defined in the robust statistics literature. This thesis examines the robustness of the proposed estimates using the concept of influence function. The estimates are defined iteratively, so that the implicit differentiation method of Jorgensen is used to obtain the influence functions of the estimates. We give example plots of these influence functions which are bounded. In this thesis, we give mathematical results for all cases and we use well known real data sets to investigate our method. The statistical software R is used for all investigation. Finally, we hope that this method may give a unified approach for making parameter estimation in statistical models more robust

    A tutorial for estimating mixture models for visual working memory tasks in brms: Introducing the Bayesian Measurement Modeling (bmm) package for R

    Full text link
    Mixture models for visual working memory tasks using continuous report recall are highly popular measurement models in visual working memory research. Yet, efficient and easy-to-implement estimation procedures that flexibly enable group or condition comparisons are scarce. Specifically, most software packages implementing mixture models have used maximum likelihood estimation for single-subject data. Such estimation procedures require large trial numbers per participant to obtain robust and reliable estimates. This problem can be solved with hierarchical Bayesian estimation procedures that provide robust and reliable estimates with lower trial numbers. In this tutorial, we illustrate how mixture models for visual working memory tasks can be specified and fit in the R package brms. The benefit of this implementation over existing hierarchical Bayesian implementations is that brms integrates hierarchical Bayesian estimation of the mixture models with an efficient linear model syntax that enables us to adapt the mixture model to practically any experimental design. Specifically, this implementation allows varying model parameters over arbitrary groups or experimental conditions. Additionally, the hierarchical structure and the specification of informed priors can improve subject-level parameter estimation and solve estimation problems frequently. We will illustrate these benefits in different examples and provide R code for easy adaptation to other use cases. We also introduce a new R package called bmm, which simplifies the process of estimating these models with brms

    Model-based clustering via linear cluster-weighted models

    Full text link
    A novel family of twelve mixture models with random covariates, nested in the linear tt cluster-weighted model (CWM), is introduced for model-based clustering. The linear tt CWM was recently presented as a robust alternative to the better known linear Gaussian CWM. The proposed family of models provides a unified framework that also includes the linear Gaussian CWM as a special case. Maximum likelihood parameter estimation is carried out within the EM framework, and both the BIC and the ICL are used for model selection. A simple and effective hierarchical random initialization is also proposed for the EM algorithm. The novel model-based clustering technique is illustrated in some applications to real data. Finally, a simulation study for evaluating the performance of the BIC and the ICL is presented

    Modal-based estimation via heterogeneity-penalized weighting: model averaging for consistent and efficient estimation in Mendelian randomization when a plurality of candidate instruments are valid.

    Get PDF
    BACKGROUND: A robust method for Mendelian randomization does not require all genetic variants to be valid instruments to give consistent estimates of a causal parameter. Several such methods have been developed, including a mode-based estimation method giving consistent estimates if a plurality of genetic variants are valid instruments; i.e. there is no larger subset of invalid instruments estimating the same causal parameter than the subset of valid instruments. METHODS: We here develop a model-averaging method that gives consistent estimates under the same 'plurality of valid instruments' assumption. The method considers a mixture distribution of estimates derived from each subset of genetic variants. The estimates are weighted such that subsets with more genetic variants receive more weight, unless variants in the subset have heterogeneous causal estimates, in which case that subset is severely down-weighted. The mode of this mixture distribution is the causal estimate. This heterogeneity-penalized model-averaging method has several technical advantages over the previously proposed mode-based estimation method. RESULTS: The heterogeneity-penalized model-averaging method outperformed the mode-based estimation in terms of efficiency and outperformed other robust methods in terms of Type 1 error rate in an extensive simulation analysis. The proposed method suggests two distinct mechanisms by which inflammation affects coronary heart disease risk, with subsets of variants suggesting both positive and negative causal effects. CONCLUSIONS: The heterogeneity-penalized model-averaging method is an additional robust method for Mendelian randomization with excellent theoretical and practical properties, and can reveal features in the data such as the presence of multiple causal mechanisms
    corecore