950 research outputs found
FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R
FlexMix implements a general framework for fitting discrete mixtures of regression models in the R statistical computing environment: three variants of the EM algorithm can be used for parameter estimation, regressors and responses may be multivariate with arbitrary dimension, data may be grouped, e.g., to account for multiple observations per individual, the usual formula interface of the S language is used for convenient model specification, and a modular concept of driver functions allows to interface many different types of regression models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering. FlexMix provides the E-step and all data handling, while the M-step can be supplied by the user to easily define new models.
mixtools: An R Package for Analyzing Mixture Models
The mixtools package for R provides a set of functions for analyzing a variety of finite mixture models. These functions include both traditional methods, such as EM algorithms for univariate and multivariate normal mixtures, and newer methods that reflect some recent research in finite mixture models. In the latter category, mixtools provides algorithms for estimating parameters in a wide range of different mixture-of-regression contexts, in multinomial mixtures such as those arising from discretizing continuous multivariate data, in nonparametric situations where the multivariate component densities are completely unspecified, and in semiparametric situations such as a univariate location mixture of symmetric but otherwise unspecified densities. Many of the algorithms of the mixtools package are EM algorithms or are based on EM-like ideas, so this article includes an overview of EM algorithms for finite mixture models.
Assessing the Number of Components in Mixture Models: a Review.
Despite the widespread application of finite mixture models, the decision of how many classes are required to adequately represent the data is, according to many authors, an important, but unsolved issue. This work aims to review, describe and organize the available approaches designed to help the selection of the adequate number of mixture components (including Monte Carlo test procedures, information criteria and classification-based criteria); we also provide some published simulation results about their relative performance, with the purpose of identifying the scenarios where each criterion is more effective (adequate).Finite mixture; number of mixture components; information criteria; simulation studies.
FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters
flexmix provides infrastructure for flexible fitting of finite mixture models in R using the expectation-maximization (EM) algorithm or one of its variants. The functionality of the package was enhanced. Now concomitant variable models as well as varying and constant parameters for the component specific generalized linear regression models can be fitted. The application of the package is demonstrated on several examples, the implementation described and examples given to illustrate how new drivers for the component specific models and the concomitant variable models can be defined.
Sample- and segment-size specific Model Selection in Mixture Regression Analysis
As mixture regression models increasingly receive attention from both theory and practice, the question of selecting the correct number of segments gains urgency. A misspecification can lead to an under- or oversegmentation, thus resulting in flawed management decisions on customer targeting or product positioning.
This paper presents the results of an extensive simulation study that examines the performance of commonly used information criteria in a mixture regression context with normal data. Unlike with previous studies, the performance is evaluated at a broad range of sample/segment size combinations being the most critical factors for the effectiveness of the criteria from both a theoretical and practical point of view. In order to assess the absolute performance of each criterion with respect to chance, the performance is reviewed against so called chance criteria, derived from discriminant analysis.
The results induce recommendations on criterion selection when a certain sample size is given and help to judge what sample size is needed in order to guarantee an accurate decision based on a certain criterion respectively
Testing for Homogeneity in Mixture Models
Statistical models of unobserved heterogeneity are typically formalized as
mixtures of simple parametric models and interest naturally focuses on testing
for homogeneity versus general mixture alternatives. Many tests of this type
can be interpreted as tests, as in Neyman (1959), and shown to be
locally, asymptotically optimal. These tests will be contrasted
with a new approach to likelihood ratio testing for general mixture models. The
latter tests are based on estimation of general nonparametric mixing
distribution with the Kiefer and Wolfowitz (1956) maximum likelihood estimator.
Recent developments in convex optimization have dramatically improved upon
earlier EM methods for computation of these estimators, and recent results on
the large sample behavior of likelihood ratios involving such estimators yield
a tractable form of asymptotic inference. Improvement in computation efficiency
also facilitates the use of a bootstrap methods to determine critical values
that are shown to work better than the asymptotic critical values in finite
samples. Consistency of the bootstrap procedure is also formally established.
We compare performance of the two approaches identifying circumstances in which
each is preferred
Construction of Dependent Dirichlet Processes Based on Poisson Processes
We present a method for constructing dependent Dirichlet processes. The new approach
exploits the intrinsic relationship between Dirichlet and Poisson processes
in order to create a Markov chain of Dirichlet processes suitable for use as a prior
over evolving mixture models. The method allows for the creation, removal, and
location variation of component models over time while maintaining the property
that the random measures are marginally DP distributed. Additionally, we derive
a Gibbs sampling algorithm for model inference and test it on both synthetic and
real data. Empirical results demonstrate that the approach is effective in estimating
dynamically varying mixture models
Sample- and segment-size specific Model Selection in Mixture Regression Analysis
As mixture regression models increasingly receive attention from both theory and practice, the question of selecting the correct number of segments gains urgency. A misspecification can lead to an under- or oversegmentation, thus resulting in flawed management decisions on customer targeting or product positioning. This paper presents the results of an extensive simulation study that examines the performance of commonly used information criteria in a mixture regression context with normal data. Unlike with previous studies, the performance is evaluated at a broad range of sample/segment size combinations being the most critical factors for the effectiveness of the criteria from both a theoretical and practical point of view. In order to assess the absolute performance of each criterion with respect to chance, the performance is reviewed against so called chance criteria, derived from discriminant analysis. The results induce recommendations on criterion selection when a certain sample size is given and help to judge what sample size is needed in order to guarantee an accurate decision based on a certain criterion respectively.Mixture Regression; Model Selection; Information Criteria
- …