51 research outputs found
Conditions for rapid mixing of parallel and simulated tempering on multimodal distributions
We give conditions under which a Markov chain constructed via parallel or
simulated tempering is guaranteed to be rapidly mixing, which are applicable to
a wide range of multimodal distributions arising in Bayesian statistical
inference and statistical mechanics. We provide lower bounds on the spectral
gaps of parallel and simulated tempering. These bounds imply a single set of
sufficient conditions for rapid mixing of both techniques. A direct consequence
of our results is rapid mixing of parallel and simulated tempering for several
normal mixture models, and for the mean-field Ising model.Comment: Published in at http://dx.doi.org/10.1214/08-AAP555 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bayesian model search and multilevel inference for SNP association studies
Technological advances in genotyping have given rise to hypothesis-based
association studies of increasing scope. As a result, the scientific hypotheses
addressed by these studies have become more complex and more difficult to
address using existing analytic methodologies. Obstacles to analysis include
inference in the face of multiple comparisons, complications arising from
correlations among the SNPs (single nucleotide polymorphisms), choice of their
genetic parametrization and missing data. In this paper we present an efficient
Bayesian model search strategy that searches over the space of genetic markers
and their genetic parametrization. The resulting method for Multilevel
Inference of SNP Associations, MISA, allows computation of multilevel posterior
probabilities and Bayes factors at the global, gene and SNP level, with the
prior distribution on SNP inclusion in the model providing an intrinsic
multiplicity correction. We use simulated data sets to characterize MISA's
statistical power, and show that MISA has higher power to detect association
than standard procedures. Using data from the North Carolina Ovarian Cancer
Study (NCOCS), MISA identifies variants that were not identified by standard
methods and have been externally ``validated'' in independent studies. We
examine sensitivity of the NCOCS results to prior choice and method for
imputing missing data. MISA is available in an R package on CRAN.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS322 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
MCMC implementation for Bayesian hidden semi-Markov models with illustrative applications
Copyright © Springer 2013. The final publication is available at Springer via http://dx.doi.org/10.1007/s11222-013-9399-zHidden Markov models (HMMs) are flexible, well established models useful in a diverse range of applications.
However, one potential limitation of such models lies in their inability to explicitly structure the holding times of each hidden state. Hidden semi-Markov models (HSMMs) are more useful in the latter respect as they incorporate additional temporal structure by explicit modelling of the holding times. However, HSMMs have generally received less attention in the literature, mainly due to their intensive computational requirements. Here a Bayesian implementation of HSMMs is presented. Recursive algorithms are proposed in conjunction with Metropolis-Hastings in such a way as to avoid sampling from the distribution of the hidden state sequence in the MCMC sampler. This provides a computationally tractable estimation framework for HSMMs avoiding the limitations associated with the conventional EM algorithm regarding model flexibility. Performance of the proposed implementation is demonstrated through simulation experiments as well as an illustrative application relating to recurrent failures in a network of underground water pipes where random effects are also included into the HSMM to allow for pipe heterogeneity
Intergenic and Genic Sequence Lengths Have Opposite Relationships with Respect to Gene Expression
Eukaryotic genomes are mostly composed of noncoding DNA whose role is still poorly understood. Studies in several organisms have shown correlations between the length of the intergenic and genic sequences of a gene and the expression of its corresponding mRNA transcript. Some studies have found a positive relationship between intergenic sequence length and expression diversity between tissues, and concluded that genes under greater regulatory control require more regulatory information in their intergenic sequences. Other reports found a negative relationship between expression level and gene length and the interpretation was that there is selection pressure for highly expressed genes to remain small. However, a correlation between gene sequence length and expression diversity, opposite to that observed for intergenic sequences, has also been reported, and to date there is no testable explanation for this observation. To shed light on these varied and sometimes conflicting results, we performed a thorough study of the relationships between sequence length and gene expression using cell-type (tissue) specific microarray data in Arabidopsis thaliana. We measured median gene expression across tissues (expression level), expression variability between tissues (expression pattern uniformity), and expression variability between replicates (expression noise). We found that intergenic (upstream and downstream) and genic (coding and noncoding) sequences have generally opposite relationships with respect to expression, whether it is tissue variability, median, or expression noise. To explain these results we propose a model, in which the lengths of the intergenic and genic sequences have opposite effects on the ability of the transcribed region of the gene to be epigenetically regulated for differential expression. These findings could shed light on the role and influence of noncoding sequences on gene expression
- …