121 research outputs found

    A phylogenetic method to perform genome-wide association studies in microbes

    Get PDF
    Genome-Wide Association Studies (GWAS) are designed to perform an unbiased search of genetic sequence data with the intent of identifying statistically significant associations with a phenotype or trait of interest. The application of GWAS methods to microbial organisms promises to improve the way we understand, manage, and treat infectious diseases. Yet, while microbial pathogens continue to undermine human health, wealth, and longevity, microbial GWAS methods remain unable to fully capitalise on the growing wealth of bacterial and viral genetic sequence data. Clonal population structure and homologous recombination in microbial organisms make it difficult for existing GWAS methods to achieve both the precision needed to reject false positive findings and the statistical power required to detect genuine associations between microbial genotypic and phenotypic variants. In this thesis, we investigate potential solutions to the most substantial methodological challenges in microbial GWAS, and we introduce a new phylogenetic GWAS approach that has been specifically designed for use in bacterial samples. In presenting our approach, we describe the features that render it robust to the confounding effects of both population structure and recombination, while maintaining high statistical power to detect associations. Our approach is applicable to organisms ranging from purely clonal to frequently recombining, to sequence data from both the core and accessory genome, and to binary, categorical, and continuous phenotypes. We also describe the efforts taken to make our method efficient, scalable, and accessible in its implementation within the open-source R package we have created, called treeWAS. Next, we apply our GWAS method to simulated datasets. We develop multiple frameworks for simulating genotypic and phenotypic data with control over relevant parameters. We then present the results of our simulation study, and we use thorough performance testing to demonstrate the power and specificity of our approach, as compared to the performance of alternative cluster-based and dimension-reduction methods. Our approach is then applied to three empirical datasets, from Neisseria gonorrhoeae and Neisseria meningitidis, where we identify core SNPs associated with binary drug resistance and continuous antibiotic minimum inhibitory concentration phenotypes, as well as both core SNP and accessory genome associations with invasive and commensal phenotypes. These applications illustrate the versatility and potential of our method, demonstrating in each case that our approach is capable of confirming known resistance- or virulence-associated loci and discovering novel associations. Our thesis concludes with a review of the previous chapters and an evaluation of the strengths and limitations displayed by the current implementation of our phylogenetic approach to association testing. We discuss key areas for further development, and we propose potential solutions to advance the development of microbial GWAS in future work.Open Acces

    Nonparametric inference for classification and association with high dimensional genetic data

    Get PDF

    Mini-Workshop: Recent Developments in Statistical Methods with Applications to Genetics and Genomics

    Get PDF
    Recent progress in high-throughput genomic technologies has revolutionized the field of human genetics and promises to lead to important scientific advances. With new improvements in massively parallel biotechnologies, it is becoming increasingly more efficient to generate vast amounts of information at the genomics, transcriptomics, proteomics, metabolomics etc. levels, opening up as yet unexplored opportunities in the search for the genetic causes of complex traits. Despite this tremendous progress in data generation, it remains very challenging to analyze, integrate and interpret these data. The resulting data are high-dimensional and very sparse, and efficient statistical methods are critical in order to extract the rich information contained in these data. The major focus of the mini-workshop, entitled “Recent Developments in Statistical Methods with Applications to Genetics and Genomics”, has been on integrative methods. Relevant research questions included the optimal study design for integrative genomic analyses; appropriate handling and pre-processing of different types of omics data; statistical methods for integration of multiple types of omics data; adjustment for confounding due to latent factors such as cell or tissue heterogeneity; the optimal use of omics data to enhance or make sense of results identified through genetic studies; and statistical and computational strategies for analysis of multiple types of high-dimensional data

    Cell Culture Models of Genetic Variation

    Get PDF
    Studying genetic variation presents a dilemma. While the genetic variation of greatest interest is that causing variation in traits and disease risk in natural populations, natural populations have characteristics that make them challenging to study. In this work, I have assessed the use of cell culture methods as a solution to some of these challenges. In particular, I studied genetic variation in the budding yeast Saccharomyces cerevisiae that was generated by selection in the lab as a model for natural genetic variation. I have found that even simplistic selection programs in the laboratory, including the use of chemical mutagenesis to introduce genetic variation, can be used to rapidly generate genetic variation with the same characteristics as that observed in natural populations of budding yeast. I also explored the use of human-derived lymphoblastoid cell lines as source of genetic variation that eliminates some of the most challenging problems that arise from the use of humans as research subjects. In addition to the ethical limitations, there are also severe technical limitations to the study of human subjects, not least of which is the difficulty of direct experimentation to confirm hypotheses. I found that lymphoblastoid cell lines are a reliable experimental system in which phenotypic variation, at the cellular level, primarily represents differences between lines, a significant portion of which is due to additive genetic variation. Due to the growth of publicly available genotype data, these lines can be used to locate genetic variants with phenotypic effects by linkage-association mapping. In addition to the shared database resources, cell lines are amenable to distribution from central repositories, suggesting that cell culture could form the basis of a community resource for the study of human genetic variation. While cell culture methods have share weaknesses with traditional genetic model systems, the use of a variety of cell culture approaches, including microorganisms and human-derived cell lines, represents an important, complementary approach to the investigation of genetic variation both for basic, mechanistic questions and for understanding the genetic causes of diversity in human phenotypes

    Case-Only Studies of Gene-Environment Interaction: Role of Linkage Disequilibrium and Population Stratification

    Get PDF
    Studies of gene-environment interactions (G×E) have been considered important owing to their scientific and public health implications. Indeed, many common complex diseases including inflammatory bowel disease (IBD) are presumed to rely on both genetic (G) and environmental (E) risk factors. One major challenge to G×E studies is the insufficient power of traditional epidemiological study designs. A nontraditional approach, the case-only (CO) design, has been proposed as a potentially efficient strategy to assess G×E. Previously, the CO approach was shown to provide better per-sample power compared to other epidemiological study designs including case-control or cohort designs. This approach relies upon two key assumptions, namely that (i) the disease is sufficiently rare in the general population and that (ii) G and E are uncorrelated in the general population. When these assumptions are valid, departures from a multiplicative relative risk model, colloquially known as a ‘multiplicative interaction’, can be evaluated by testing the association between G and E in cases only. Therefore, in contrast to case-control studies, CO studies require genotype and exposure information from a set of affected individuals alone (‘no controls’) to track down the underlying G×E. In the past, CO studies of G×E usually followed a candidate (or single-) gene approach, but their utility on genome-wide level remained unexplored

    Regional brain volumes and antidepressant treatment resistance in major depressive disorder

    Get PDF
    Major depressive disorder (MDD) is a heritable and highly debilitating condition with antidepressants, first-line treatment, demonstrating low to modest response rates. No current biological mechanism substantially explains MDD but both neurostructural and neurochemical pathways have been suggested. Further explication of these may aid in identifying subgroups of MDD that are better defined by their aetiology. Specifically, genetic stratification provides an array of tools to do this, including the intermediate phenotype approach which was applied in this thesis. This thesis explores genetic overlap with regional brain volume and MDD and the genetic and non-genetic components of antidepressant response. The first study utilised the most recent published data from ENIGMA (Enhancing Neuroimaging Genetics through Meta-analysis) Consortium’s genome-wide association study (GWAS) of regional brain volume to examine shared genetic architecture between seven subcortical brain volumes and intracranial volume (ICV) and MDD. This was explored using linkage disequilibrium score regression (LDSC), polygenic risk scoring (PRS) techniques, Mendelian randomisation (MR) analysis and BUHMBOX (Breaking Up Heterogeneous Mixture Based On Cross-locus correlations). Results indicated that hippocampal volume was positively genetically correlated with MDD (rg= 0.46, P= 0.02), although this did not survive multiple comparison testing. Additionally, there was evidence for genetic subgrouping in Generation Scotland: Scottish Family Health Study (GS:SFHS) MDD cases (P=0.00281), however, this was not replicated in two other independent samples. This study does not support a shared architecture for regional brain volumes and MDD, however, provided some evidence that hippocampal volume and MDD may share genetic architecture in a subgroup of individuals, albeit the genetic correlation did not survive multiple testing correction and genetic subgroup heterogeneity was not replicated. To explore antidepressant treatment resistance, the second study utilised prescription data in (GS:SFHS) to define a measure of (a) treatment resistance (TR) and (b) stages of resistance (SR) by inferring antidepressant switching as non-response. GWAS were conducted separately for TR in GS:SFHS and the GENDEP (Genome-based Therapeutic Drugs for Depression) study and then meta-analysed (meta-analysis n=4,213, cases=358). For SR, a GWAS on GS:SFHS only was performed (n=3,452). Additionally, gene-set enrichment, polygenic risk scoring (PRS) and genetic correlation analysis were conducted. No significant locus, gene or gene-set was associated with TR or SR, however power analysis indicated that this analysis was underpowered. Pedigree-based correlations identified genetic overlap with psychological distress, schizotypy and mood disorder traits. Finally, the role of neuroticism, psychological resilience and coping styles in antidepressant resistance was investigated. Univariate, moderation and mediation models were applied using logistic regression and structural equation modelling techniques. In univariate models, neuroticism and emotion-orientated coping demonstrated significant negative association with antidepressant resistance, whereas resilience, task-orientated and avoidance-orientated coping demonstrated significant positive association. No moderation of the association between neuroticism and TR was detected and no mediating effect of coping styles was found. However, resilience was found to partially mediate the association between neuroticism and TR. Whilst the first study does not indicate a genetic overlap between regional brain volumes and MDD, it demonstrates the utility of the intermediate approach in complex disease. Antidepressant resistance was associated with neuroticism both genetically and phenotypically, indicating its role as an intermediate phenotype. Nonetheless, larger sample sizes are needed to adequately address the components of antidepressant resistance. Further work in antidepressant non-response may help to identify biological mechanisms responsible in MDD pathology and help stratify individuals into more tractable groups
    • 

    corecore