950 research outputs found

    FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R

    Get PDF
    FlexMix implements a general framework for fitting discrete mixtures of regression models in the R statistical computing environment: three variants of the EM algorithm can be used for parameter estimation, regressors and responses may be multivariate with arbitrary dimension, data may be grouped, e.g., to account for multiple observations per individual, the usual formula interface of the S language is used for convenient model specification, and a modular concept of driver functions allows to interface many different types of regression models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering. FlexMix provides the E-step and all data handling, while the M-step can be supplied by the user to easily define new models.

    mixtools: An R Package for Analyzing Mixture Models

    Get PDF
    The mixtools package for R provides a set of functions for analyzing a variety of finite mixture models. These functions include both traditional methods, such as EM algorithms for univariate and multivariate normal mixtures, and newer methods that reflect some recent research in finite mixture models. In the latter category, mixtools provides algorithms for estimating parameters in a wide range of different mixture-of-regression contexts, in multinomial mixtures such as those arising from discretizing continuous multivariate data, in nonparametric situations where the multivariate component densities are completely unspecified, and in semiparametric situations such as a univariate location mixture of symmetric but otherwise unspecified densities. Many of the algorithms of the mixtools package are EM algorithms or are based on EM-like ideas, so this article includes an overview of EM algorithms for finite mixture models.

    Assessing the Number of Components in Mixture Models: a Review.

    Get PDF
    Despite the widespread application of finite mixture models, the decision of how many classes are required to adequately represent the data is, according to many authors, an important, but unsolved issue. This work aims to review, describe and organize the available approaches designed to help the selection of the adequate number of mixture components (including Monte Carlo test procedures, information criteria and classification-based criteria); we also provide some published simulation results about their relative performance, with the purpose of identifying the scenarios where each criterion is more effective (adequate).Finite mixture; number of mixture components; information criteria; simulation studies.

    FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters

    Get PDF
    flexmix provides infrastructure for flexible fitting of finite mixture models in R using the expectation-maximization (EM) algorithm or one of its variants. The functionality of the package was enhanced. Now concomitant variable models as well as varying and constant parameters for the component specific generalized linear regression models can be fitted. The application of the package is demonstrated on several examples, the implementation described and examples given to illustrate how new drivers for the component specific models and the concomitant variable models can be defined.

    Sample- and segment-size specific Model Selection in Mixture Regression Analysis

    Get PDF
    As mixture regression models increasingly receive attention from both theory and practice, the question of selecting the correct number of segments gains urgency. A misspecification can lead to an under- or oversegmentation, thus resulting in flawed management decisions on customer targeting or product positioning. This paper presents the results of an extensive simulation study that examines the performance of commonly used information criteria in a mixture regression context with normal data. Unlike with previous studies, the performance is evaluated at a broad range of sample/segment size combinations being the most critical factors for the effectiveness of the criteria from both a theoretical and practical point of view. In order to assess the absolute performance of each criterion with respect to chance, the performance is reviewed against so called chance criteria, derived from discriminant analysis. The results induce recommendations on criterion selection when a certain sample size is given and help to judge what sample size is needed in order to guarantee an accurate decision based on a certain criterion respectively

    Testing for Homogeneity in Mixture Models

    Full text link
    Statistical models of unobserved heterogeneity are typically formalized as mixtures of simple parametric models and interest naturally focuses on testing for homogeneity versus general mixture alternatives. Many tests of this type can be interpreted as C(α)C(\alpha) tests, as in Neyman (1959), and shown to be locally, asymptotically optimal. These C(α)C(\alpha) tests will be contrasted with a new approach to likelihood ratio testing for general mixture models. The latter tests are based on estimation of general nonparametric mixing distribution with the Kiefer and Wolfowitz (1956) maximum likelihood estimator. Recent developments in convex optimization have dramatically improved upon earlier EM methods for computation of these estimators, and recent results on the large sample behavior of likelihood ratios involving such estimators yield a tractable form of asymptotic inference. Improvement in computation efficiency also facilitates the use of a bootstrap methods to determine critical values that are shown to work better than the asymptotic critical values in finite samples. Consistency of the bootstrap procedure is also formally established. We compare performance of the two approaches identifying circumstances in which each is preferred

    Construction of Dependent Dirichlet Processes Based on Poisson Processes

    Get PDF
    We present a method for constructing dependent Dirichlet processes. The new approach exploits the intrinsic relationship between Dirichlet and Poisson processes in order to create a Markov chain of Dirichlet processes suitable for use as a prior over evolving mixture models. The method allows for the creation, removal, and location variation of component models over time while maintaining the property that the random measures are marginally DP distributed. Additionally, we derive a Gibbs sampling algorithm for model inference and test it on both synthetic and real data. Empirical results demonstrate that the approach is effective in estimating dynamically varying mixture models

    Sample- and segment-size specific Model Selection in Mixture Regression Analysis

    Get PDF
    As mixture regression models increasingly receive attention from both theory and practice, the question of selecting the correct number of segments gains urgency. A misspecification can lead to an under- or oversegmentation, thus resulting in flawed management decisions on customer targeting or product positioning. This paper presents the results of an extensive simulation study that examines the performance of commonly used information criteria in a mixture regression context with normal data. Unlike with previous studies, the performance is evaluated at a broad range of sample/segment size combinations being the most critical factors for the effectiveness of the criteria from both a theoretical and practical point of view. In order to assess the absolute performance of each criterion with respect to chance, the performance is reviewed against so called chance criteria, derived from discriminant analysis. The results induce recommendations on criterion selection when a certain sample size is given and help to judge what sample size is needed in order to guarantee an accurate decision based on a certain criterion respectively.Mixture Regression; Model Selection; Information Criteria
    corecore