2,845 research outputs found

    Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects

    Get PDF
    Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. 

Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an
eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation. 

We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies

    Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies.

    Get PDF
    Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown subtle environmental perturbations. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, this new model can more accurately distinguish true genetic association signals from confounding variation. We applied our model and compared it to existing methods on different datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, our approach not only identifies a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies. A software implementation of PANAMA is freely available online at http://ml.sheffield.ac.uk/qtl/

    A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies

    Get PDF
    Gene expression measurements are influenced by a wide range of factors, such as the state of the cell, experimental conditions and variants in the sequence of regulatory regions. To understand the effect of a variable of interest, such as the genotype of a locus, it is important to account for variation that is due to confounding causes. Here, we present VBQTL, a probabilistic approach for mapping expression quantitative trait loci (eQTLs) that jointly models contributions from genotype as well as known and hidden confounding factors. VBQTL is implemented within an efficient and flexible inference framework, making it fast and tractable on large-scale problems. We compare the performance of VBQTL with alternative methods for dealing with confounding variability on eQTL mapping datasets from simulations, yeast, mouse, and human. Employing Bayesian complexity control and joint modelling is shown to result in more precise estimates of the contribution of different confounding factors resulting in additional associations to measured transcript levels compared to alternative approaches. We present a threefold larger collection of cis eQTLs than previously found in a whole-genome eQTL scan of an outbred human population. Altogether, 27% of the tested probes show a significant genetic association in cis, and we validate that the additional eQTLs are likely to be real by replicating them in different sets of individuals. Our method is the next step in the analysis of high-dimensional phenotype data, and its application has revealed insights into genetic regulation of gene expression by demonstrating more abundant cis-acting eQTLs in human than previously shown. Our software is freely available online at http://www.sanger.ac.uk/resources/software/peer/

    A role for heritable transcriptomic variation in maize adaptation to temperate environments

    Get PDF
    Background: Transcription bridges genetic information and phenotypes. Here, we evaluated how changes in transcriptional regulation enable maize (Zea mays), a crop originally domesticated in the tropics, to adapt to temperate environments. Result: We generated 572 unique RNA-seq datasets from the roots of 340 maize genotypes. Genes involved in core processes such as cell division, chromosome organization and cytoskeleton organization showed lower heritability of gene expression, while genes involved in anti-oxidation activity exhibited higher expression heritability. An expression genome-wide association study (eGWAS) identified 19,602 expression quantitative trait loci (eQTLs) associated with the expression of 11,444 genes. A GWAS for alternative splicing identified 49,897 splicing QTLs (sQTLs) for 7614 genes. Genes harboring both cis-eQTLs and cis-sQTLs in linkage disequilibrium were disproportionately likely to encode transcription factors or were annotated as responding to one or more stresses. Independent component analysis of gene expression data identified loci regulating co-expression modules involved in oxidation reduction, response to water deprivation, plastid biogenesis, protein biogenesis, and plant-pathogen interaction. Several genes involved in cell proliferation, flower development, DNA replication, and gene silencing showed lower gene expression variation explained by genetic factors between temperate and tropical maize lines. A GWAS of 27 previously published phenotypes identified several candidate genes overlapping with genomic intervals showing signatures of selection during adaptation to temperate environments. Conclusion: Our results illustrate how maize transcriptional regulatory networks enable changes in transcriptional regulation to adapt to temperate regions

    A role for heritable transcriptomic variation in maize adaptation to temperate environments

    Get PDF
    Background: Transcription bridges genetic information and phenotypes. Here, we evaluated how changes in transcriptional regulation enable maize (Zea mays), a crop originally domesticated in the tropics, to adapt to temperate environments. Result: We generated 572 unique RNA-seq datasets from the roots of 340 maize genotypes. Genes involved in core processes such as cell division, chromosome organization and cytoskeleton organization showed lower heritability of gene expression, while genes involved in anti-oxidation activity exhibited higher expression heritability. An expression genome-wide association study (eGWAS) identified 19,602 expression quantitative trait loci (eQTLs) associated with the expression of 11,444 genes. A GWAS for alternative splicing identified 49,897 splicing QTLs (sQTLs) for 7614 genes. Genes harboring both cis-eQTLs and cis-sQTLs in linkage disequilibrium were disproportionately likely to encode transcription factors or were annotated as responding to one or more stresses. Independent component analysis of gene expression data identified loci regulating co-expression modules involved in oxidation reduction, response to water deprivation, plastid biogenesis, protein biogenesis, and plant-pathogen interaction. Several genes involved in cell proliferation, flower development, DNA replication, and gene silencing showed lower gene expression variation explained by genetic factors between temperate and tropical maize lines. A GWAS of 27 previously published phenotypes identified several candidate genes overlapping with genomic intervals showing signatures of selection during adaptation to temperate environments. Conclusion: Our results illustrate how maize transcriptional regulatory networks enable changes in transcriptional regulation to adapt to temperate regions

    Mendelian randomization and its application to genome-wide association studies

    Get PDF
    Genetics aims is to study heredity: how traits are passed from one generation to the next and how genetic variations can lead to changes in phenotypes. Some phenotypes, called complex or quantitative traits, are under the control of both genetic and environmental factors. Examples of complex traits include quantitative phenotypes, such as height or cholesterol levels, as well as certain diseases, like diabetes or cardiovascular diseases. Genome-wide association studies (GWASs) are used to statistically test for the association between each genetic variant and a given phenotype. These studies confirmed that most complex traits are influenced by a large number of genetic variants, often exhibiting small effect sizes that can only be detected using large numbers of individuals. They also permitted the estimation of narrow-sense heritability, which is the proportion of phenotypic variation that can be attributed to these genetic variations. The results of such GWASs (association results for every genetic variant) are often made publicly available and they can be used to perform follow-up analyses, for example Mendelian randomization. Mendelian randomization aims at investigating causal relationships between complex traits and estimating the causal effect of one exposure on an outcome. This method mimicks randomized controlled trials and takes advantage of the fact that genetic variations are randomly distributed across the population. By using association results for genetic variants strongly associated with a given risk factor and measuring the effect of these variants on another trait or disease, Mendelian randomization can infer the existence and the strength of the causal relationship between them. Analyses helping to understand the genetics underlying complex traits and the relationships between them are key to precision medicine. Precision medicine is an approach that takes into account the genome sequence and the environmental exposures of each patient, to provide personalized prevention and treatment to each individual. During my thesis, I have been involved in several projects aiming at developing statistical methods that rely on Mendelian randomization. In the first part, I worked on a Bayesian GWAS approach (bGWAS). The goal of this approach is to increase statistical power to discover variants associated with a trait by leveraging data from correlated risk factors. The idea is to combine (1) the causal effects of the risk factors on the trait of interest (estimated using Mendelian randomization) with (2) the association results of genetic variants with these risk factors, in order to estimate the prior effect of each variant on the trait of interest. This approach has been used to study the genetics underlying lifespan, taking into account various potential risk factors, such as body mass index, cholesterol levels, and several diseases for example. In the second part, I worked on developing Mendelian randomization extensions (MRlap and LHC-MR) that aim at tackling some of the most common sources of bias. These extensions allow for more robust causal effect estimates, when some of the Mendelian randomization assumptions are violated, as well as for an extension of the scope of application of Mendelian randomization. -- La gĂ©nĂ©tique est l’étude de la transmission de traits hĂ©rĂ©ditaires au sein d’une population. Un dĂ©fi majeur de la gĂ©nĂ©tique moderne est cependant d’expliquer le mĂ©canisme exact par lequel les variations gĂ©nĂ©tiques peuvent, ou non, se traduire par des variations phĂ©notypiques. Ce dĂ©fi est d’autant plus important dans le cas des traits dits «complexes», qui sont affectĂ©s Ă  la fois par des facteurs gĂ©nĂ©tiques et par des facteurs environnementaux. C’est le cas par exemple de la taille adulte, du taux de cholestĂ©rol ou encore de certaines maladies, comme le diabĂšte. Les Ă©tudes d’association pangĂ©nomique, en anglais genome-wide association studies (GWASs), permettent de tester si des variants gĂ©nĂ©tiques sont statistiquement associĂ©s Ă  un phĂ©notype donnĂ©. Ces Ă©tudes ont confirmĂ© que la plupart des traits complexes sont influencĂ©s par un trĂšs large nombre de variants gĂ©nĂ©tiques, dont chacun a souvent un faible effet qui n’aurait pas Ă©tĂ© dĂ©tectĂ© sans l’accĂšs Ă  de larges jeux de donnĂ©es. Elles ont Ă©galement permis d’estimer la part de la variation phĂ©notypique expliquĂ©e par l’ensemble des variants (hĂ©ritabilitĂ© au sens Ă©troit). Les rĂ©sultats de ces GWASs sont souvent publiĂ©s sous forme de statistiques synthĂ©tiques (pour chaque variant gĂ©nĂ©tique) qui peuvent ĂȘtre utilisĂ©es pour rĂ©aliser des analyses additionnelles, notamment des analyses de randomisation mendĂ©lienne. Celles-ci permettent d’étudier les relations de cause Ă  effet entre diffĂ©rents traits complexes et d’estimer l’effet de causalitĂ© d’un trait sur un autre. Les variations gĂ©nĂ©tiques Ă©tant thĂ©oriquement rĂ©parties de façon alĂ©atoire dans une population, la randomisation mendĂ©lienne est une alternative aux essais cliniques randomisĂ©s. En utilisant les rĂ©sultats d’association de variants gĂ©nĂ©tiques associĂ©s spĂ©cifiquement avec un facteur de risque et en mesurant leurs effets sur un autre trait, la randomisation mendĂ©lienne permet d’établir une relation de cause Ă  effet entre deux traits. Ces Ă©tudes, permettant la comprĂ©hension des causes gĂ©nĂ©tiques Ă  l’origine des traits complexes ainsi que des relations de cause Ă  effet pouvant exister entre ceux-ci, ouvrent la voie au dĂ©veloppement de la mĂ©decine de prĂ©cision, une approche prenant en compte toutes les informations concernant un individu (gĂ©nĂ©tiques et environnementales) pour proposer Ă  chacun un diagnostic et un traitement personnalisĂ©s. Durant mon doctorat, j’ai Ă©tĂ© impliquĂ©e dans diffĂ©rents projets visant Ă  dĂ©velopper des approches techniques basĂ©es sur la randomisation mendĂ©lienne. Dans un premier temps, j’ai travaillĂ© sur une mĂ©thode appelĂ©e GWAS bayĂ©sien (bGWAS). Cette mĂ©thode utilise des informations provenant de potentiel facteurs de risques identifiĂ©s a priori de façon Ă  augmenter la puissance statistique de l’identification de variants gĂ©nĂ©tiques associĂ©s Ă  un trait d’intĂ©rĂȘt. L’idĂ©e est de combiner (1) les effets de causalitĂ© des risques facteurs sur le trait d’intĂ©rĂȘt (estimĂ©s en utilisant la randomisation mendĂ©lienne) et (2) les rĂ©sultats d’association des variants gĂ©nĂ©tiques avec ces facteurs de risque, pour estimer leur effet a priori sur le trait d’intĂ©rĂȘt. Cette mĂ©thode a notamment Ă©tĂ© utilisĂ©e pour Ă©tudier les causes gĂ©nĂ©tiques influençant l’espĂ©rance de vie, en prenant en compte plusieurs facteurs de risques tel que certaines maladies ou encore l’indice de masse corporel. Dans un second temps, j’ai travaillĂ© sur des projets visant Ă  proposer des extensions aux mĂ©thodes classiques de randomisation mendĂ©lienne (MRlap et LHC-MR) pour les rendre plus robustes Ă  certaines sources de biais communĂ©ment observĂ©es, avec pour but d’élargir les possibilitĂ©s d’application de ces mĂ©thodes

    The Population Genetic Signature of Polygenic Local Adaptation

    Full text link
    Adaptation in response to selection on polygenic phenotypes may occur via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (GWAS). Here we combine this knowledge from GWAS with robust population genetic modeling to identify traits that may have been influenced by local adaptation. We exploit the fact that GWAS provide an estimate of the additive effect size of many loci to estimate the mean additive genetic value for a given phenotype across many populations as simple weighted sums of allele frequencies. We first describe a general model of neutral genetic value drift for an arbitrary number of populations with an arbitrary relatedness structure. Based on this model we develop methods for detecting unusually strong correlations between genetic values and specific environmental variables, as well as a generalization of QST/FSTQ_{ST}/F_{ST} comparisons to test for over-dispersion of genetic values among populations. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles, and also significantly outperform methods that do not account for population structure. We apply our tests to the Human Genome Diversity Panel (HGDP) dataset using GWAS data for height, skin pigmentation, type 2 diabetes, body mass index, and two inflammatory bowel disease datasets. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.Comment: 42 pages including 8 figures and 3 tables; supplementary figures and tables not included on this upload, but are mostly unchanged from v

    A stew of mixed ingredients: Observational omics in the post-GWAS era

    Get PDF
    The past 20 years have seen extensive profiling of the DNA. Collectively, scientists all across the world have identified many places in the DNA, known as loci, that impact human traits such as disease state or immune function. However, interpreting the results from these studies, known as genome wide association studies (GWAS), has been challenging. This thesis studies several approaches for interpreting GWAS results, with a specific focus on our immune system given its important role in preventing and causing disease. This is done through the use of so called ‘omics’ technologies, that can study the role of thousands of genes, proteins and genetic variants at the same time. By doing this, maps can be constructed of which genes and proteins interact to impact human traits. The ultimate goal of this research is to provide a better understanding of the cascade between the DNA and human traits. The hope is that building a specific understanding of how the variation in the DNA leads to the development of human traits, such as disease, will ultimately aid the development of drugs for these diseases
    • 

    corecore