1,157 research outputs found

    Increasing the Power to Detect Causal Associations by Combining Genotypic and Expression Data in Segregating Populations

    Get PDF
    To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models

    Mapping the genetic architecture of gene expression in human liver

    Get PDF
    Genetic variants that are associated with common human diseases do not lead directly to disease, but instead act on intermediate, molecular phenotypes that in turn induce changes in higher-order disease traits. Therefore, identifying the molecular phenotypes that vary in response to changes in DNA and that also associate with changes in disease traits has the potential to provide the functional information required to not only identify and validate the susceptibility genes that are directly affected by changes in DNA, but also to understand the molecular networks in which such genes operate and how changes in these networks lead to changes in disease traits. Toward that end, we profiled more than 39,000 transcripts and we genotyped 782,476 unique single nucleotide polymorphisms (SNPs) in more than 400 human liver samples to characterize the genetic architecture of gene expression in the human liver, a metabolically active tissue that is important in a number of common human diseases, including obesity, diabetes, and atherosclerosis. This genome-wide association study of gene expression resulted in the detection of more than 6,000 associations between SNP genotypes and liver gene expression traits, where many of the corresponding genes identified have already been implicated in a number of human diseases. The utility of these data for elucidating the causes of common human diseases is demonstrated by integrating them with genotypic and expression data from other human and mouse populations. This provides much-needed functional support for the candidate susceptibility genes being identified at a growing number of genetic loci that have been identified as key drivers of disease from genome-wide association studies of disease. By using an integrative genomics approach, we highlight how the gene RPS26 and not ERBB3 is supported by our data as the most likely susceptibility gene for a novel type 1 diabetes locus recently identified in a large-scale, genome-wide association study. We also identify SORT1 and CELSR2 as candidate susceptibility genes for a locus recently associated with coronary artery disease and plasma low-density lipoprotein cholesterol levels in the process. © 2008 Schadt et al

    Utilization of landraces of European flint maize for breeding and genetic research

    Get PDF
    Maize is one of the most important crops species for agriculture worldwide. Since its domestication, landraces formed the traditional type of variety. Selection and genetic factors formed a broad diversity of open-pollinated populations well adapted to local conditions. This changed with the introduction of hybrid breeding when nearly all existing landraces disappeared from their use in agriculture and as source material for breeding. Molecular analyses showed a narrow genetic base of the flint heterotic pool compared to the dent pool. Since genetic resources in maize are one of the richest of all major crops, the exploitation of this untapped reservoir of genetic variation in landraces could be an option to reverse the ongoing narrowing of the genetic basis to meet the demands of a growing world population as well as new challenges under a changing global climate and reduced inputs. The main goal of this study was the evaluation of European flint maize landraces to unlock their genetic diversity. In detail our objectives were to (i) determine the variation for testcross performance of European maize landraces; (ii) evaluate the phenotypic and genotypic variation of immortalized lines within and among landraces; (iii) compare the per se performance of those line libraries with elite lines as well as founder lines from the European flint germplasm pool; (iv) analyze the breeding potential of immortalized lines from landraces in comparison with elite material to improve the narrow genetic base of the flint heterotic pool; (v) demonstrate the high mapping resolution of DH libraries from landraces in association mapping down to causal variants and underlying genes; and (vi) provide conclusions and guidelines for breeding and research using libraries of immortalized lines from landraces. In a first experiment, we evaluated in multi-environment trials a broad collection of 70 European flint landraces for their testcross performance in combination with two elite dent testers. In comparison with the yield of modern hybrids, grain yield of the testcrosses of landraces was on average 26% lower, but a high genotypic variance among the landrace was observed for all traits and correlations were moderate to high for most trait combinations similar to those found in elite materials. Genetic correlations between the two testcross series exceeded 0.74 for all traits, suggesting that evaluation of testcross performance in combination with one or two single-cross tester(s) from the opposite pool is sufficient to assess the breeding potential of landraces. In a second experiment, we produced libraries of DH lines from the most promising landraces identified in the first experiment. In total 389 DH lines from six European flint landraces were evaluated together with four flint founder lines and 53 elite flint lines for 16 agronomic traits in four locations. In general, the genotypic variance (&#963;^2G) was larger within than among the DH libraries and exceeded also &#963;^2G of the elite flint lines. Furthermore, the means and &#963;^2G varied among the DH libraries resulting in large differences of the usefulness criterion. Mean grain yield of the elite flint lines exceeded that of the flint founder lines by 25% and DH libraries by 62%, indicating the impressive breeding progress achieved in the elite material and the substantial genetic load still present in the DH libraries. Nevertheless, the usefulness of the best DH lines was comparable to that of the elite flint lines for many traits including grain yield, underpinning the tremendous potential of landraces for broadening the genetic base of the elite germplasm. In a third experiment the materials from the 2nd experiment were genotyped with the MaizeSNP50 BeadChip from IlluminaÂź and seeds of all genotypes were used for extracting and analyzing 288 metabolites with GC-MS. Data for agronomic traits and metabolites were used for a novel association mapping study. The much faster decay of linkage disequilibrium for adjacent markers in the DH libraries compared with the elite flint lines resulted in unprecedented map resolution. This was strikingly demonstrated by fine-mapping a QTL for oil content down to the phenylalanine insertion F469 in DGAT1-2 as the causal variant. Further, for the metabolite allantoin, which is related to abiotic stress response, promoter polymorphisms as well as differential expression of an allantoinase were identified as putative causes of variation despite a moderate size of the mapping population. These results are very encouraging to use DH libraries from landraces for association mapping and dissect QTL potentially down to the causal variants. However, larger population sizes of each DH library are recommended, similar to those commonly used with other approaches such as the NAM design, for detection of QTL explaining only a small portion of the genetic variance. This opens a new avenue for utilization of natural and/or engineered alleles in breeding. In conclusion, the genetic variation present in European flint maize landraces represents a unique source to reverse the ongoing narrowing of the genetic basis of the elite germplasm of this heterotic pool. For identifying the most promising landraces, we propose a multi-stage approach, where based on an assessment of the molecular diversity about one hundred landraces are evaluated in observation trials for agro-ecological adaptation and testcrosses with one single-cross tester are used for evaluating their general combining ability with the opposite heterotic pool. For a small number (< 6) of landraces a large number of DH lines are developed, which are phenotyped and genotyped for further use in association mapping and genomic selection with the ultimate goal to make these gold reserves accessible for maize breeding with modern approaches.Mais ist eine der wichtigsten Kulturarten fĂŒr die Landwirtschaft weltweit. Seit seiner Domestikation bildeten Landrassen den traditionellen Sortentyp. Durch Selektion und genetische Faktoren entstand eine breite DiversitĂ€t an panmiktisch vermehrten Populationen, die gut an lokale Bedingungen angepasst waren. Dies Ă€nderte sich mit der EinfĂŒhrung der HybridzĂŒchtung, als nahezu alle Landrassen in der landwirtschaftlichen Produktion und als Ausgangsmaterial fĂŒr die ZĂŒchtung verschwanden. Molekulare Analysen zeigen eine enge genetische Basis des Flint Pools im Vergleich zum Dent Pool. Genetische Ressourcen im Mais gehören zu den umfangreichsten aller Nutzpflanzen. Die Nutzung dieses bislang ungenutzten Reservoirs an genetischer DiversitĂ€t in Landrassen bietet eine Möglichkeit, um der fortschreitenden Einengung der genetischen Basis entgegenzuwirken und somit den Aufgaben der PflanzenzĂŒchtung im Hinblick auf eine wachsende Weltbevölkerung sowie den Herausforderungen des Klimawandels und reduzierten Inputs im Anbau gerecht zu werden. Übergeordnetes Ziel dieser Studie war die Evaluierung europĂ€ischer Flint-Mais Landrassen, um deren genetische Vielfalt nutzen zu können. Im Speziellen waren die Ziele (i) die Variation in Testkreuzungen europĂ€ischer Mais-Landrassen zu bestimmen; (ii) die phĂ€notypische und genotypische Variation der Linien innerhalb und zwischen Landrassen zu beurteilen; (iii) die Eigenleistung dieser Linien mit Elite-Linien sowie Founder-Linien aus dem europĂ€ischen Flint-Pool zu vergleichen; (iv) das Potential von doppelhaploiden (DH) Linien aus Landrassen im Vergleich zum Elitematerial fĂŒr die ZĂŒchtung zu analysieren, um die enge genetische Basis des Flint-Pools zu erweitern; (v) die Verwendung von DH-Bibliotheken aus Landrassen fĂŒr die Assoziationskartierung bis hin zur Eingrenzung kausaler Gene zu demonstrieren; und (vi) Schlussfolgerungen und Leitlinien fĂŒr die ZĂŒchtung und Forschung zu erörternum DH-Linien aus Landrassen nutzbar zu machen. In einem ersten Versuch wurde eine umfangreiche Kollektion von 70 europĂ€ischen Flint-Landrassen mehrortig in Kombination mit zwei Elite Dent-Testern auf ihre Testkreuzungsleistung hin untersucht. Verglichen mit dem Ertrag moderner Hybriden war der Kornertrag der Testkreuzungen der Landrassen im Durchschnitt um 26 % geringer, jedoch wurde eine hohe genotypische Varianz zwischen den Landrassen fĂŒr alle Merkmale beobachtet. Die Korrelationen waren mittel bis hoch fĂŒr die meisten Merkmalskombinationen und entsprachen denen im Elitezuchtmaterial. Die genetische Korrelation der beiden Testkreuzungsserien ĂŒberstieg 0,74 fĂŒr alle Merkmale. Dies zeigt, dass es ausreicht die Leistung von Testkreuzungen in Kombination mit einem oder zwei Testern - bestehend aus Einfachkreuzungen des anderen Gen-Pools zu bewerten, um das Potenzial von Landrassen fĂŒr die ZĂŒchtung zu beurteilen. In einem zweiten Versuch produzierten wir Bibliotheken von DH-Linien der vielversprechendsten Landrassen des vorigen Versuches. Insgesamt wurden 389 DH-Linien aus sechs europĂ€ischer Flint Landrassen zusammen mit vier Flint Founder-Linien und 53 Elite Flintlinien auf 16 agronomische Merkmale an vier Standorten geprĂŒft. Die genotypische Varianz (&#963;^2G) innerhalb der DH-Bibliotheken war grĂ¶ĂŸer als die zwischen den Bibliotheken und ĂŒbertraf auch &#963;^2G der Elite Flintlinien. DarĂŒber hinaus variierten die Mittelwerte und &#963;^2G zwischen den DH-Bibliotheken, was zu großen Unterschieden im Brauchbarkeits-Kriterium (usefulness) fĂŒhrte. Der mittlere Kornertrag der Elite Flintlinien ĂŒbertraf den der Flint Founder-Linien um 25 % und der DH-Bibliotheken um 62 %, was auf den betrĂ€chtlichen Zuchtfortschritt im Elitematerial hinweist sowie auf die erhebliche genetische BĂŒrde, welche in den DH-Bibliotheken vorliegt. Die Brauchbarkeit der besten DH-Linien war trotzdem fĂŒr viele Merkmale, einschließlich dem Kornertrag, mit der von Elite Flintlinien vergleichbar. Dies zeigt das enorme Potenzial, Landrassen zur Verbreiterung des genetisch engen Elite Flint-Pools zu verwenden. In einem dritten Versuch wurden das genetische Material des vorherigen Versuches mit dem MaizeSNP50 BeadChip von IlluminaÂź genotypisiert und Samen aller Genotypen zur Extraktion und Analyse von 288 Metaboliten mit GC-MS verwendet. Sowohl die agronomischen Merkmale als auch die Metabolit-Daten wurden fĂŒr eine Assoziationskartierung verwendet. Der schnelle Abfall des Kopplungsungleichgewichts benachbarter Marker in den DH-Bibliotheken im Vergleich zu den Elite Flintlinien fĂŒhrte zu einer hervorragenden Auflösung in der QTL-Kartierung, was durch die Feinkartierung eines QTL (= quantitative trait locus) fĂŒr Ölgehalt bis zur Phenylalanin Insertion F469 in DGAT1-2 als kausale Variante demonstriert werden konnte. DarĂŒber hinaus wurden fĂŒr den Metaboliten Allantoin, der im Zusammenhang mit abiotischem Stress steht, Promotorpolymorphismen sowie die Expression einer Allantoinase als vermutete Ursache der Variation identifiziert. Dies gelang trotz der moderaten GrĂ¶ĂŸe der Kartierungspopulation. Diese Ergebnisse sind ermutigend, um DH-Bibliotheken von Landrassen fĂŒr die Assoziationskartierung zu verwenden und QTL bis auf die kausalen Varianten zu entschlĂŒsseln. Eine Erweiterung der PopulationsgrĂ¶ĂŸen der DH-Bibliotheken, Ă€hnlich wie sie in anderen Versuchsdesigns in der Literatur verwendet wurden, ist hierbei zu empfehlen, um mit diesem Ansatz QTL zu detektieren, welche lediglich einen kleinen Teil der genetischen Varianz erklĂ€ren. Dies eröffnet neue Wege zur Nutzung natĂŒrlicher und/oder neu geschaffener Allele in der ZĂŒchtung. Zusammenfassend zeigen die Ergebnisse dieser Arbeit, dass die genetische Variation europĂ€ischer Landrassen bei Flint-Mais eine einzigartige Quelle darstellt, um die fortschreitende Verengung der genetischen Basis des Elitematerials in diesem Gen-Pool umzukehren. Um vielversprechende Landrassen zu identifizieren, schlagen wir folgenden zweistufigen Ansatz vor: (i) Basierend auf der Bewertung der molekularen DiversitĂ€t werden etwa hundert Landrassen in LeistungsprĂŒfungen auf ihre AnpassungsfĂ€higkeit fĂŒr die Zielregionen evaluiert und ihre KombinationsfĂ€higkeit mit dem entgegengesetzten heterotischen Gen-Pool in Testkreuzungen mit einer Einfachkreuzung als Tester bewertet. (ii) FĂŒr eine geringe Zahl (< 6) von Landrassen wird anschließend eine große Anzahl von DH-Linien erstellt, welche fĂŒr die Nutzung in der Assoziationskartierung und/oder genomischen Selektion phĂ€notypisiert und genotypisiert werden, um diese Goldreserven fĂŒr die MaiszĂŒchtung mit innovativen Methoden zugĂ€nglich zu machen

    Moving toward a system genetics view of disease

    Get PDF
    Testing hundreds of thousands of DNA markers in human, mouse, and other species for association to complex traits like disease is now a reality. However, information on how variations in DNA impact complex physiologic processes flows through transcriptional and other molecular networks. In other words, DNA variations impact complex diseases through the perturbations they cause to transcriptional and other biological networks, and these molecular phenotypes are intermediate to clinically defined disease. Because it is also now possible to monitor transcript levels in a comprehensive fashion, integrating DNA variation, transcription, and phenotypic data has the potential to enhance identification of the associations between DNA variation and diseases like obesity and diabetes, as well as characterize those parts of the molecular networks that drive these diseases. Toward that end, we review methods for integrating expression quantitative trait loci (eQTLs), gene expression, and clinical data to infer causal relationships among gene expression traits and between expression and clinical traits. We further describe methods to integrate these data in a more comprehensive manner by constructing coexpression gene networks that leverage pairwise gene interaction data to represent more general relationships. To infer gene networks that capture causal information, we describe a Bayesian algorithm that further integrates eQTLs, expression, and clinical phenotype data to reconstruct whole-gene networks capable of representing causal relationships among genes and traits in the network. These emerging network approaches, aimed at processing high-dimensional biological data by integrating data from multiple sources, represent some of the first steps in statistical genetics to identify multiple genetic perturbations that alter the states of molecular networks and that in turn push systems into disease states. Evolving statistical procedures that operate on networks will be critical to extracting information related to complex phenotypes like disease, as research goes beyond a single-gene focus. The early successes achieved with the methods described herein suggest that these more integrative genomics approaches to dissecting disease traits will significantly enhance the identification of key drivers of disease beyond what could be achieved by genetic association studies alone

    Using Stochastic Causal Trees to Augment Bayesian Networks for Modeling eQTL Datasets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The combination of genotypic and genome-wide expression data arising from segregating populations offers an unprecedented opportunity to model and dissect complex phenotypes. The immense potential offered by these data derives from the fact that genotypic variation is the sole source of perturbation and can therefore be used to reconcile changes in gene expression programs with the parental genotypes. To date, several methodologies have been developed for modeling eQTL data. These methods generally leverage genotypic data to resolve causal relationships among gene pairs implicated as associates in the expression data. In particular, leading studies have augmented Bayesian networks with genotypic data, providing a powerful framework for learning and modeling causal relationships. While these initial efforts have provided promising results, one major drawback associated with these methods is that they are generally limited to resolving causal orderings for transcripts most proximal to the genomic loci. In this manuscript, we present a probabilistic method capable of learning the causal relationships between transcripts at all levels in the network. We use the information provided by our method as a prior for Bayesian network structure learning, resulting in enhanced performance for gene network reconstruction.</p> <p>Results</p> <p>Using established protocols to synthesize eQTL networks and corresponding data, we show that our method achieves improved performance over existing leading methods. For the goal of gene network reconstruction, our method achieves improvements in recall ranging from 20% to 90% across a broad range of precision levels and for datasets of varying sample sizes. Additionally, we show that the learned networks can be utilized for expression quantitative trait loci mapping, resulting in upwards of 10-fold increases in recall over traditional univariate mapping.</p> <p>Conclusions</p> <p>Using the information from our method as a prior for Bayesian network structure learning yields large improvements in accuracy for the tasks of gene network reconstruction and expression quantitative trait loci mapping. In particular, our method is effective for establishing causal relationships between transcripts located both proximally and distally from genomic loci.</p

    Genetic Influences on Brain Gene Expression in Rats Selected for Tameness and Aggression

    Full text link
    Inter-individual differences in many behaviors are partly due to genetic differences, but the identification of the genes and variants that influence behavior remains challenging. Here, we studied an F2 intercross of two outbred lines of rats selected for tame and aggressive behavior towards humans for more than 64 generations. By using a mapping approach that is able to identify genetic loci segregating within the lines, we identified four times more loci influencing tameness and aggression than by an approach that assumes fixation of causative alleles, suggesting that many causative loci were not driven to fixation by the selection. We used RNA sequencing in 150 F2 animals to identify hundreds of loci that influence brain gene expression. Several of these loci colocalize with tameness loci and may reflect the same genetic variants. Through analyses of correlations between allele effects on behavior and gene expression, differential expression between the tame and aggressive rat selection lines, and correlations between gene expression and tameness in F2 animals, we identify the genes Gltscr2, Lgi4, Zfp40 and Slc17a7 as candidate contributors to the strikingly different behavior of the tame and aggressive animals

    Advances in Genetical Genomics of Plants

    Get PDF
    Natural variation provides a valuable resource to study the genetic regulation of quantitative traits. In quantitative trait locus (QTL) analyses this variation, captured in segregating mapping populations, is used to identify the genomic regions affecting these traits. The identification of the causal genes underlying QTLs is a major challenge for which the detection of gene expression differences is of major importance. By combining genetics with large scale expression profiling (i.e. genetical genomics), resulting in expression QTLs (eQTLs), great progress can be made in connecting phenotypic variation to genotypic diversity. In this review we discuss examples from human, mouse, Drosophila, yeast and plant research to illustrate the advances in genetical genomics, with a focus on understanding the regulatory mechanisms underlying natural variation. With their tolerance to inbreeding, short generation time and ease to generate large families, plants are ideal subjects to test new concepts in genetics. The comprehensive resources which are available for Arabidopsis make it a favorite model plant but genetical genomics also found its way to important crop species like rice, barley and wheat. We discuss eQTL profiling with respect to cis and trans regulation and show how combined studies with other ‘omics’ technologies, such as metabolomics and proteomics may further augment current information on transcriptional, translational and metabolomic signaling pathways and enable reconstruction of detailed regulatory networks. The fast developments in the ‘omics’ area will offer great potential for genetical genomics to elucidate the genotype-phenotype relationships for both fundamental and applied research

    High-Dimensional Bayesian Network Inference From Systems Genetics Data Using Genetic Node Ordering

    Get PDF
    Studying the impact of genetic variation on gene regulatory networks is essential to understand the biological mechanisms by which genetic variation causes variation in phenotypes. Bayesian networks provide an elegant statistical approach for multi-trait genetic mapping and modelling causal trait relationships. However, inferring Bayesian gene networks from high-dimensional genetics and genomics data is challenging, because the number of possible networks scales super-exponentially with the number of nodes, and the computational cost of conventional Bayesian network inference methods quickly becomes prohibitive. We propose an alternative method to infer high-quality Bayesian gene networks that easily scales to thousands of genes. Our method first reconstructs a node ordering by conducting pairwise causal inference tests between genes, which then allows to infer a Bayesian network via a series of independent variable selection problems, one for each gene. We demonstrate using simulated and real systems genetics data that this results in a Bayesian network with equal, and sometimes better, likelihood than the conventional methods, while having a significantly higher overlap with groundtruth networks and being orders of magnitude faster. Moreover our method allows for a unified false discovery rate control across genes and individual edges, and thus a rigorous and easily interpretable way for tuning the sparsity level of the inferred network. Bayesian network inference using pairwise node ordering is a highly efficient approach for reconstructing gene regulatory networks when prior information for the inclusion of edges exists or can be inferred from the available data

    Explaining additional genetic variation in complex traits

    Get PDF
    Genome-wide association studies (GWAS) have provided valuable insights into the genetic basis of complex traits, discovering >6000 variants associated with >500 quantitative traits and common complex diseases in humans. The associations identified so far represent only a fraction of those that influence phenotype, because there are likely to be many variants across the entire frequency spectrum, each of which influences multiple traits, with only a small average contribution to the phenotypic variance. This presents a considerable challenge to further dissection of the remaining unexplained genetic variance within populations, which limits our ability to predict disease risk, identify new drug targets, improve and maintain food sources, and understand natural diversity. This challenge will be met within the current framework through larger sample size, better phenotyping, including recording of nongenetic risk factors, focused study designs, and an integration of multiple sources of phenotypic and genetic information. The current evidence supports the application of quantitative genetic approaches, and we argue that one should retain simpler theories until simplicity can be traded for greater explanatory power
    • 

    corecore