1,962 research outputs found
Determining the number of components in mixture regression models: an experimental design
Despite the popularity of mixture regression models, the decision of how many components to retain remains an open issue. This study thus sought to compare the performance of 26 information and classification criteria. Each criterion was evaluated in terms of that component's success rate. The research's full experimental design included manipulating 9 factors and 22 levels. The best results were obtained for 5 criteria: Akaike information criteria 3 (AIC3), AIC4, Hannan-Quinn information criteria, integrated completed likelihood (ICL) Bayesian information criteria (BIC) and ICL with BIC approximation. Each criterion's performance varied according to the experimental conditions.info:eu-repo/semantics/publishedVersio
Latent Markov model for longitudinal binary data: An application to the performance evaluation of nursing homes
Performance evaluation of nursing homes is usually accomplished by the
repeated administration of questionnaires aimed at measuring the health status
of the patients during their period of residence in the nursing home. We
illustrate how a latent Markov model with covariates may effectively be used
for the analysis of data collected in this way. This model relies on a not
directly observable Markov process, whose states represent different levels of
the health status. For the maximum likelihood estimation of the model we apply
an EM algorithm implemented by means of certain recursions taken from the
literature on hidden Markov chains. Of particular interest is the estimation of
the effect of each nursing home on the probability of transition between the
latent states. We show how the estimates of these effects may be used to
construct a set of scores which allows us to rank these facilities in terms of
their efficacy in taking care of the health conditions of their patients. The
method is used within an application based on data concerning a set of nursing
homes located in the Region of Umbria, Italy, which were followed for the
period 2003--2005.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS230 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Two Monte Carlo studies for latent class segmentation models
Model assessment and comparison are essential aspects of statistical inference. The likelihood ratio test is one of the main instruments for model selection; however, this is not appropriate when the model under consideration contains random effects. In this paper, we present two simulation studies for latent class segmentation models. The first Monte Carlo study compares the performance of seven Information Criteria in predicting the correct number of segments. The second study investigates factors that have an effect on segment membership and parameter recovery and affect computational effort.peer-reviewe
Exact ICL maximization in a non-stationary temporal extension of the stochastic block model for dynamic networks
The stochastic block model (SBM) is a flexible probabilistic tool that can be
used to model interactions between clusters of nodes in a network. However, it
does not account for interactions of time varying intensity between clusters.
The extension of the SBM developed in this paper addresses this shortcoming
through a temporal partition: assuming interactions between nodes are recorded
on fixed-length time intervals, the inference procedure associated with the
model we propose allows to cluster simultaneously the nodes of the network and
the time intervals. The number of clusters of nodes and of time intervals, as
well as the memberships to clusters, are obtained by maximizing an exact
integrated complete-data likelihood, relying on a greedy search approach.
Experiments on simulated and real data are carried out in order to assess the
proposed methodology
Sample- and segment-size specific Model Selection in Mixture Regression Analysis
As mixture regression models increasingly receive attention from both theory and practice, the question of selecting the correct number of segments gains urgency. A misspecification can lead to an under- or oversegmentation, thus resulting in flawed management decisions on customer targeting or product positioning. This paper presents the results of an extensive simulation study that examines the performance of commonly used information criteria in a mixture regression context with normal data. Unlike with previous studies, the performance is evaluated at a broad range of sample/segment size combinations being the most critical factors for the effectiveness of the criteria from both a theoretical and practical point of view. In order to assess the absolute performance of each criterion with respect to chance, the performance is reviewed against so called chance criteria, derived from discriminant analysis. The results induce recommendations on criterion selection when a certain sample size is given and help to judge what sample size is needed in order to guarantee an accurate decision based on a certain criterion respectively.Mixture Regression; Model Selection; Information Criteria
Fuzzy cluster validation using the partition negentropy criterion
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-04277-5_24Proceedings of the 19th International Conference, Limassol, Cyprus, September 14-17, 2009We introduce the Partition Negentropy Criterion (PNC) for cluster validation. It is a cluster validity index that rewards the average normality of the clusters, measured by means of the negentropy, and penalizes the overlap, measured by the partition entropy. The PNC is aimed at finding well separated clusters whose shape is approximately Gaussian. We use the new index to validate fuzzy partitions in a set of synthetic clustering problems, and compare the results to those obtained by the AIC, BIC and ICL criteria. The partitions are obtained by fitting a Gaussian Mixture Model to the data using the EM algorithm. We show that, when the real clusters are normally distributed, all the criteria are able to correctly assess the number of components, with AIC and BIC
allowing a higher cluster overlap. However, when the real cluster distributions are not Gaussian (i.e. the distribution assumed by the mixture model) the PNC outperforms the other indices, being able to correctly
evaluate the number of clusters while the other criteria (specially AIC and BIC) tend to overestimate it.This work has been partially supported with funds from
MEC BFU2006-07902/BFI, CAM S-SEM-0255-2006 and CAM/UAM project CCG08-UAM/TIC-442
- …