28 research outputs found

    Bayesian model search and multilevel inference for SNP association studies

    Full text link
    Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally ``validated'' in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS322 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    An Adaptive Interacting Wang-Landau Algorithm for Automatic Density Exploration

    Full text link
    While statisticians are well-accustomed to performing exploratory analysis in the modeling stage of an analysis, the notion of conducting preliminary general-purpose exploratory analysis in the Monte Carlo stage (or more generally, the model-fitting stage) of an analysis is an area which we feel deserves much further attention. Towards this aim, this paper proposes a general-purpose algorithm for automatic density exploration. The proposed exploration algorithm combines and expands upon components from various adaptive Markov chain Monte Carlo methods, with the Wang-Landau algorithm at its heart. Additionally, the algorithm is run on interacting parallel chains -- a feature which both decreases computational cost as well as stabilizes the algorithm, improving its ability to explore the density. Performance is studied in several applications. Through a Bayesian variable selection example, the authors demonstrate the convergence gains obtained with interacting chains. The ability of the algorithm's adaptive proposal to induce mode-jumping is illustrated through a trimodal density and a Bayesian mixture modeling application. Lastly, through a 2D Ising model, the authors demonstrate the ability of the algorithm to overcome the high correlations encountered in spatial models.Comment: 33 pages, 20 figures (the supplementary materials are included as appendices

    AHRQ series on complex intervention systematic reviews-paper 5: advanced analytic methods.

    Get PDF
    BACKGROUND AND OBJECTIVE: Advanced analytic methods for synthesizing evidence about complex interventions continue to be developed. In this paper, we emphasize that the specific research question posed in the review should be used as a guide for choosing the appropriate analytic method. METHODS: We present advanced analytic approaches that address four common questions that guide reviews of complex interventions: (1) How effective is the intervention? (2) For whom does the intervention work and in what contexts? (3) What happens when the intervention is implemented? and (4) What decisions are possible given the results of the synthesis? CONCLUSION: The analytic approaches presented in this paper are particularly useful when each primary study differs in components, mechanisms of action, context, implementation, timing, and many other domains

    MCMC implementation for Bayesian hidden semi-Markov models with illustrative applications

    Get PDF
    Copyright © Springer 2013. The final publication is available at Springer via http://dx.doi.org/10.1007/s11222-013-9399-zHidden Markov models (HMMs) are flexible, well established models useful in a diverse range of applications. However, one potential limitation of such models lies in their inability to explicitly structure the holding times of each hidden state. Hidden semi-Markov models (HSMMs) are more useful in the latter respect as they incorporate additional temporal structure by explicit modelling of the holding times. However, HSMMs have generally received less attention in the literature, mainly due to their intensive computational requirements. Here a Bayesian implementation of HSMMs is presented. Recursive algorithms are proposed in conjunction with Metropolis-Hastings in such a way as to avoid sampling from the distribution of the hidden state sequence in the MCMC sampler. This provides a computationally tractable estimation framework for HSMMs avoiding the limitations associated with the conventional EM algorithm regarding model flexibility. Performance of the proposed implementation is demonstrated through simulation experiments as well as an illustrative application relating to recurrent failures in a network of underground water pipes where random effects are also included into the HSMM to allow for pipe heterogeneity

    Intergenic and Genic Sequence Lengths Have Opposite Relationships with Respect to Gene Expression

    Get PDF
    Eukaryotic genomes are mostly composed of noncoding DNA whose role is still poorly understood. Studies in several organisms have shown correlations between the length of the intergenic and genic sequences of a gene and the expression of its corresponding mRNA transcript. Some studies have found a positive relationship between intergenic sequence length and expression diversity between tissues, and concluded that genes under greater regulatory control require more regulatory information in their intergenic sequences. Other reports found a negative relationship between expression level and gene length and the interpretation was that there is selection pressure for highly expressed genes to remain small. However, a correlation between gene sequence length and expression diversity, opposite to that observed for intergenic sequences, has also been reported, and to date there is no testable explanation for this observation. To shed light on these varied and sometimes conflicting results, we performed a thorough study of the relationships between sequence length and gene expression using cell-type (tissue) specific microarray data in Arabidopsis thaliana. We measured median gene expression across tissues (expression level), expression variability between tissues (expression pattern uniformity), and expression variability between replicates (expression noise). We found that intergenic (upstream and downstream) and genic (coding and noncoding) sequences have generally opposite relationships with respect to expression, whether it is tissue variability, median, or expression noise. To explain these results we propose a model, in which the lengths of the intergenic and genic sequences have opposite effects on the ability of the transcribed region of the gene to be epigenetically regulated for differential expression. These findings could shed light on the role and influence of noncoding sequences on gene expression
    corecore