1,962 research outputs found

    Determining the number of components in mixture regression models: an experimental design

    Get PDF
    Despite the popularity of mixture regression models, the decision of how many components to retain remains an open issue. This study thus sought to compare the performance of 26 information and classification criteria. Each criterion was evaluated in terms of that component's success rate. The research's full experimental design included manipulating 9 factors and 22 levels. The best results were obtained for 5 criteria: Akaike information criteria 3 (AIC3), AIC4, Hannan-Quinn information criteria, integrated completed likelihood (ICL) Bayesian information criteria (BIC) and ICL with BIC approximation. Each criterion's performance varied according to the experimental conditions.info:eu-repo/semantics/publishedVersio

    Latent Markov model for longitudinal binary data: An application to the performance evaluation of nursing homes

    Full text link
    Performance evaluation of nursing homes is usually accomplished by the repeated administration of questionnaires aimed at measuring the health status of the patients during their period of residence in the nursing home. We illustrate how a latent Markov model with covariates may effectively be used for the analysis of data collected in this way. This model relies on a not directly observable Markov process, whose states represent different levels of the health status. For the maximum likelihood estimation of the model we apply an EM algorithm implemented by means of certain recursions taken from the literature on hidden Markov chains. Of particular interest is the estimation of the effect of each nursing home on the probability of transition between the latent states. We show how the estimates of these effects may be used to construct a set of scores which allows us to rank these facilities in terms of their efficacy in taking care of the health conditions of their patients. The method is used within an application based on data concerning a set of nursing homes located in the Region of Umbria, Italy, which were followed for the period 2003--2005.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS230 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Two Monte Carlo studies for latent class segmentation models

    Get PDF
    Model assessment and comparison are essential aspects of statistical inference. The likelihood ratio test is one of the main instruments for model selection; however, this is not appropriate when the model under consideration contains random effects. In this paper, we present two simulation studies for latent class segmentation models. The first Monte Carlo study compares the performance of seven Information Criteria in predicting the correct number of segments. The second study investigates factors that have an effect on segment membership and parameter recovery and affect computational effort.peer-reviewe

    Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks

    Full text link
    The stochastic block model (SBM) is a flexible probabilistic tool that can be used to model interactions between clusters of nodes in a network. However, it does not account for interactions of time varying intensity between clusters. The extension of the SBM developed in this paper addresses this shortcoming through a temporal partition: assuming interactions between nodes are recorded on fixed-length time intervals, the inference procedure associated with the model we propose allows to cluster simultaneously the nodes of the network and the time intervals. The number of clusters of nodes and of time intervals, as well as the memberships to clusters, are obtained by maximizing an exact integrated complete-data likelihood, relying on a greedy search approach. Experiments on simulated and real data are carried out in order to assess the proposed methodology

    Sample- and segment-size specific Model Selection in Mixture Regression Analysis

    Get PDF
    As mixture regression models increasingly receive attention from both theory and practice, the question of selecting the correct number of segments gains urgency. A misspecification can lead to an under- or oversegmentation, thus resulting in flawed management decisions on customer targeting or product positioning. This paper presents the results of an extensive simulation study that examines the performance of commonly used information criteria in a mixture regression context with normal data. Unlike with previous studies, the performance is evaluated at a broad range of sample/segment size combinations being the most critical factors for the effectiveness of the criteria from both a theoretical and practical point of view. In order to assess the absolute performance of each criterion with respect to chance, the performance is reviewed against so called chance criteria, derived from discriminant analysis. The results induce recommendations on criterion selection when a certain sample size is given and help to judge what sample size is needed in order to guarantee an accurate decision based on a certain criterion respectively.Mixture Regression; Model Selection; Information Criteria

    Fuzzy cluster validation using the partition negentropy criterion

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-04277-5_24Proceedings of the 19th International Conference, Limassol, Cyprus, September 14-17, 2009We introduce the Partition Negentropy Criterion (PNC) for cluster validation. It is a cluster validity index that rewards the average normality of the clusters, measured by means of the negentropy, and penalizes the overlap, measured by the partition entropy. The PNC is aimed at finding well separated clusters whose shape is approximately Gaussian. We use the new index to validate fuzzy partitions in a set of synthetic clustering problems, and compare the results to those obtained by the AIC, BIC and ICL criteria. The partitions are obtained by fitting a Gaussian Mixture Model to the data using the EM algorithm. We show that, when the real clusters are normally distributed, all the criteria are able to correctly assess the number of components, with AIC and BIC allowing a higher cluster overlap. However, when the real cluster distributions are not Gaussian (i.e. the distribution assumed by the mixture model) the PNC outperforms the other indices, being able to correctly evaluate the number of clusters while the other criteria (specially AIC and BIC) tend to overestimate it.This work has been partially supported with funds from MEC BFU2006-07902/BFI, CAM S-SEM-0255-2006 and CAM/UAM project CCG08-UAM/TIC-442
    • …
    corecore