1,277 research outputs found
Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation
Using a collection of simulated an real benchmarks, we compare Bayesian and
frequentist regularization approaches under a low informative constraint when
the number of variables is almost equal to the number of observations on
simulated and real datasets. This comparison includes new global noninformative
approaches for Bayesian variable selection built on Zellner's g-priors that are
similar to Liang et al. (2008). The interest of those calibration-free
proposals is discussed. The numerical experiments we present highlight the
appeal of Bayesian regularization methods, when compared with non-Bayesian
alternatives. They dominate frequentist methods in the sense that they provide
smaller prediction errors while selecting the most relevant variables in a
parsimonious way
Enhancing the selection of a model-based clustering with external qualitative variables
In cluster analysis, it can be useful to interpret the partition built from
the data in the light of external categorical variables which were not directly
involved to cluster the data. An approach is proposed in the model-based
clustering context to select a model and a number of clusters which both fit
the data well and take advantage of the potential illustrative ability of the
external variables. This approach makes use of the integrated joint likelihood
of the data and the partitions at hand, namely the model-based partition and
the partitions associated to the external variables. It is noteworthy that each
mixture model is fitted by the maximum likelihood methodology to the data,
excluding the external variables which are used to select a relevant mixture
model only. Numerical experiments illustrate the promising behaviour of the
derived criterion
Mixtures of Regression Models for Time-Course Gene Expression Data: Evaluation of Initialization and Random Effects
Finite mixture models are routinely applied to time course microarray data.
Due to the complexity and size of this type of data the choice of good starting values plays
an important role. So far initialization strategies have only been investigated for data
from a mixture of multivariate normal distributions. In this work several initialization
procedures are evaluated for mixtures of regression models with and without random
effects in an extensive simulation study on different artificial datasets. Finally these
procedures are also applied to a real dataset from E. coli
Some discussions on the Read Paper "Beyond subjective and objective in statistics" by A. Gelman and C. Hennig
This note is a collection of several discussions of the paper "Beyond
subjective and objective in statistics", read by A. Gelman and C. Hennig to the
Royal Statistical Society on April 12, 2017, and to appear in the Journal of
the Royal Statistical Society, Series A
Identifiability of a Switching Markov State-Space Model
International audienceWhile switching Markov state-space models arise in many applied science applications like signal processing, bioinformatics, etc., it is often difficult to establish their identifiability which is essential for parameters estimation. This paper discusses the simple case in which the unknown continuous state and the observations are scalars. We demonstrate that if a prior information relating the observations to the unknown continuous state at a time t0 is available, and if the Markov chain is irreducible and aperiodic, the set of the model parameters will be " globally structurally identifiable ". In addition, we show that under these constraints, the model parameters can be efficiently estimated by an EM algorithm.Les modèles à espaces d'états gouvernés par une chaîne de Markov cachée sont utilisés dans de nombreux domaines appliqués comme le traitement de signal, la bioinformatique, etc. Cependant, il est souvent difficile d'établir leur identifiabilité, propriété essentielle pour l'estimation de leurs paramètres. Dans cet article, nous traitons un cas simple pour lequel l'état continu inconnu et les observations sont des scalaires. Nous démontrons que lorsque la chaîne de Markov est irréductible et apériodique , une information a priori reliant les observations et l'état continu inconnu à un instant t0 suffit pour assurer " l'identifiabilité générale " de l'ensemble des paramètres du modèle. Nous montrons aussi qu'en intégrant ces contraintes dans un algorithme EM, les paramètres du modèle sont estimés efficacement
Latent class analysis was accurate but sensitive in data simulations
Objectives:
Latent class methods are increasingly being used in analysis of developmental trajectories. A recent simulation study by Twisk and Hoekstra (2012) suggested caution in use of these methods because they failed to accurately identify developmental patterns that had been artificially imposed on a real data set. This article tests whether existing developmental patterns within the data set used might have obscured the imposed patterns.<p></p>
Study Design and Setting:
Data were simulated to match the latent class pattern in the previous article, but with varying levels of randomly generated variance, rather than variance carried over from a real data set. Latent class analysis (LCA) was then used to see if the latent class structure could be accurately identified.<p></p>
Results:
LCA performed very well at identifying the simulated latent class structure, even when the level of variance was similar to that reported in the previous study, although misclassification began to be more problematic with considerably higher levels of variance.<p></p>
Conclusion:
The failure of LCA to replicate the imposed patterns in the previous study may have been because it was sensitive enough to detect residual patterns of population heterogeneity within the altered data. LCA performs well at classifying developmental trajectories.<p></p>
- …