14 research outputs found

    Bayesian survival analysis in genetic association studies

    Get PDF
    Motivation: Large-scale genetic association studies are carried out with the hope of discovering single nucleotide polymorphisms involved in the etiology of complex diseases. There are several existing methods in the literature for performing this kind of analysis for case-control studies, but less work has been done for prospective cohort studies. We present a Bayesian method for linking markers to censored survival outcome by clustering haplotypes using gene trees. Coalescent-based approaches are promising for LD mapping, as the coalescent offers a good approximation to the evolutionary history of mutations

    Defining the True Sensitivity of Culture for the Diagnosis of Melioidosis Using Bayesian Latent Class Models

    Get PDF
    BACKGROUND: Culture remains the diagnostic gold standard for many bacterial infections, and the method against which other tests are often evaluated. Specificity of culture is 100% if the pathogenic organism is not found in healthy subjects, but the sensitivity of culture is more difficult to determine and may be low. Here, we apply Bayesian latent class models (LCMs) to data from patients with a single Gram-negative bacterial infection and define the true sensitivity of culture together with the impact of misclassification by culture on the reported accuracy of alternative diagnostic tests. METHODS/PRINCIPAL FINDINGS: Data from published studies describing the application of five diagnostic tests (culture and four serological tests) to a patient cohort with suspected melioidosis were re-analysed using several Bayesian LCMs. Sensitivities, specificities, and positive and negative predictive values (PPVs and NPVs) were calculated. Of 320 patients with suspected melioidosis, 119 (37%) had culture confirmed melioidosis. Using the final model (Bayesian LCM with conditional dependence between serological tests), the sensitivity of culture was estimated to be 60.2%. Prediction accuracy of the final model was assessed using a classification tool to grade patients according to the likelihood of melioidosis, which indicated that an estimated disease prevalence of 61.6% was credible. Estimates of sensitivities, specificities, PPVs and NPVs of four serological tests were significantly different from previously published values in which culture was used as the gold standard. CONCLUSIONS/SIGNIFICANCE: Culture has low sensitivity and low NPV for the diagnosis of melioidosis and is an imperfect gold standard against which to evaluate alternative tests. Models should be used to support the evaluation of diagnostic tests with an imperfect gold standard. It is likely that the poor sensitivity/specificity of culture is not specific for melioidosis, but rather a generic problem for many bacterial and fungal infections

    Comparative analysis of genome-wide association studies signals for lipids, diabetes, and coronary heart disease: Cardiovascular Biomarker Genetics Collaboration

    Get PDF
    To evaluate the associations of emergent genome-wide-association study-derived coronary heart disease (CHD)-associated single nucleotide polymorphisms (SNPs) with established and emerging risk factors, and the association of genome-wide-association study-derived lipid-associated SNPs with other risk factors and CHD events

    Genetic association mapping via evolution-based clustering of haplotypes.

    Get PDF
    Multilocus analysis of single nucleotide polymorphism haplotypes is a promising approach to dissecting the genetic basis of complex diseases. We propose a coalescent-based model for association mapping that potentially increases the power to detect disease-susceptibility variants in genetic association studies. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions with densely spaced markers and model chromosomal segments in high linkage disequilibrium therein assuming a perfect phylogeny. To make this assumption more realistic, we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium. The haplotype space is then partitioned into disjoint clusters, within which the phenotype-haplotype association is assumed to be the same. For example, in case-control studies, we expect chromosomal segments bearing the causal variant on a common ancestral background to be more frequent among cases than controls, giving rise to two separate haplotype clusters. The novelty of our approach arises from the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common ancestor. Our approach is fully Bayesian and we develop a Markov Chain Monte Carlo algorithm to sample efficiently over the space of possible partitions. We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates. Also, the method is computationally quicker than other multi-marker approaches. We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location of the susceptibility variant within a small error

    Assessing uncertainty about parameter estimates with incomplete repeated ordinal data

    No full text
    Data collected in clinical trials involving follow-up of patients over a period of time will almost inevitably be incomplete. Patients will fail to turn up at some of the intended measurement times or will not complete the study, giving rise to various patterns of missingness. In these circumstances, the validity of the conclusions drawn from an analysis of available cases depends crucially on the mechanism driving the missing data process; this in turn cannot be known for certain. For incomplete categorical data, various authors have recently proposed taking into account in a systematic way the ignorance caused by incomplete data. In particular, the idea of intervals of ignorance has been introduced, whereby point estimates for parameters of interest are replaced by intervals or regions of ignorance (Vansteelandt and Goetghebeur, 2001; Kenward et al., 2001; Molenberghs et al., 2001). These are identified by the set of estimates corresponding to possible outcomes for the missing data under little or no assumptions about the missing data mechanism. Here we extend this idea to incomplete repeated ordinal data. We describe a modified version of standard algorithms used for fitting marginal models to longitudinal categorical data, which enables calculation of intervals of ignorance for the parameters of interest. The ideas are illustrated using dental pain measurements from a longitudinal clinical trial. </jats:p

    Multilocus Bayesian Meta-Analysis of Gene-Disease Associations

    Get PDF
    Meta-analysis is a vital tool in genetic epidemiology. However, meta-analyses to identify gene-disease associations are compromised when contributing studies have typed partially overlapping sets of markers. Currently, only marginal analyses are possible, and these are restricted to the subset of studies typing that marker. This does not allow full use of available data and leads to the confounding of marker effects by closely associated markers. We present a Bayesian approach that exploits prior information on underlying haplotypes to allow multi-marker analysis incorporating data from all relevant studies of a gene or region, irrespective of the markers typed. We present results from application of our approach to data on a possible association between PDE4D and ischemic stroke

    Bayesian Graphical Models for Genomewide Association Studies

    Get PDF
    As the extent of human genetic variation becomes more fully characterized, the research community is faced with the challenging task of using this information to dissect the heritable components of complex traits. Genomewide association studies offer great promise in this respect, but their analysis poses formidable difficulties. In this article, we describe a computationally efficient approach to mining genotype-phenotype associations that scales to the size of the data sets currently being collected in such studies. We use discrete graphical models as a data-mining tool, searching for single- or multilocus patterns of association around a causative site. The approach is fully Bayesian, allowing us to incorporate prior knowledge on the spatial dependencies around each marker due to linkage disequilibrium, which reduces considerably the number of possible graphical structures. A Markov chain–Monte Carlo scheme is developed that yields samples from the posterior distribution of graphs conditional on the data from which probabilistic statements about the strength of any genotype-phenotype association can be made. Using data simulated under scenarios that vary in marker density, genotype relative risk of a causative allele, and mode of inheritance, we show that the proposed approach has better localization properties and leads to lower false-positive rates than do single-locus analyses. Finally, we present an application of our method to a quasi-synthetic data set in which data from the CYP2D6 region are embedded within simulated data on 100K single-nucleotide polymorphisms. Analysis is quick (<5 min), and we are able to localize the causative site to a very short interval
    corecore