1,277 research outputs found

    Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation

    Full text link
    Using a collection of simulated an real benchmarks, we compare Bayesian and frequentist regularization approaches under a low informative constraint when the number of variables is almost equal to the number of observations on simulated and real datasets. This comparison includes new global noninformative approaches for Bayesian variable selection built on Zellner's g-priors that are similar to Liang et al. (2008). The interest of those calibration-free proposals is discussed. The numerical experiments we present highlight the appeal of Bayesian regularization methods, when compared with non-Bayesian alternatives. They dominate frequentist methods in the sense that they provide smaller prediction errors while selecting the most relevant variables in a parsimonious way

    Enhancing the selection of a model-based clustering with external qualitative variables

    Get PDF
    In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which were not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a model and a number of clusters which both fit the data well and take advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion

    Mixtures of Regression Models for Time-Course Gene Expression Data: Evaluation of Initialization and Random Effects

    Get PDF
    Finite mixture models are routinely applied to time course microarray data. Due to the complexity and size of this type of data the choice of good starting values plays an important role. So far initialization strategies have only been investigated for data from a mixture of multivariate normal distributions. In this work several initialization procedures are evaluated for mixtures of regression models with and without random effects in an extensive simulation study on different artificial datasets. Finally these procedures are also applied to a real dataset from E. coli

    Some discussions on the Read Paper "Beyond subjective and objective in statistics" by A. Gelman and C. Hennig

    Full text link
    This note is a collection of several discussions of the paper "Beyond subjective and objective in statistics", read by A. Gelman and C. Hennig to the Royal Statistical Society on April 12, 2017, and to appear in the Journal of the Royal Statistical Society, Series A

    Identifiability of a Switching Markov State-Space Model

    Get PDF
    International audienceWhile switching Markov state-space models arise in many applied science applications like signal processing, bioinformatics, etc., it is often difficult to establish their identifiability which is essential for parameters estimation. This paper discusses the simple case in which the unknown continuous state and the observations are scalars. We demonstrate that if a prior information relating the observations to the unknown continuous state at a time t0 is available, and if the Markov chain is irreducible and aperiodic, the set of the model parameters will be " globally structurally identifiable ". In addition, we show that under these constraints, the model parameters can be efficiently estimated by an EM algorithm.Les modèles à espaces d'états gouvernés par une chaîne de Markov cachée sont utilisés dans de nombreux domaines appliqués comme le traitement de signal, la bioinformatique, etc. Cependant, il est souvent difficile d'établir leur identifiabilité, propriété essentielle pour l'estimation de leurs paramètres. Dans cet article, nous traitons un cas simple pour lequel l'état continu inconnu et les observations sont des scalaires. Nous démontrons que lorsque la chaîne de Markov est irréductible et apériodique , une information a priori reliant les observations et l'état continu inconnu à un instant t0 suffit pour assurer " l'identifiabilité générale " de l'ensemble des paramètres du modèle. Nous montrons aussi qu'en intégrant ces contraintes dans un algorithme EM, les paramètres du modèle sont estimés efficacement

    Latent class analysis was accurate but sensitive in data simulations

    Get PDF
    Objectives: Latent class methods are increasingly being used in analysis of developmental trajectories. A recent simulation study by Twisk and Hoekstra (2012) suggested caution in use of these methods because they failed to accurately identify developmental patterns that had been artificially imposed on a real data set. This article tests whether existing developmental patterns within the data set used might have obscured the imposed patterns.<p></p> Study Design and Setting: Data were simulated to match the latent class pattern in the previous article, but with varying levels of randomly generated variance, rather than variance carried over from a real data set. Latent class analysis (LCA) was then used to see if the latent class structure could be accurately identified.<p></p> Results: LCA performed very well at identifying the simulated latent class structure, even when the level of variance was similar to that reported in the previous study, although misclassification began to be more problematic with considerably higher levels of variance.<p></p> Conclusion: The failure of LCA to replicate the imposed patterns in the previous study may have been because it was sensitive enough to detect residual patterns of population heterogeneity within the altered data. LCA performs well at classifying developmental trajectories.<p></p&gt
    corecore