Search CORE

1,157 research outputs found

Increasing the Power to Detect Causal Associations by Combining Genotypic and Expression Data in Segregating Populations

Author: Arthur Fridman
Chunsheng Zhang
Eric E Schadt
Eric Minch
Gary Stormo
Jeffrey R Sachs
Jun Zhu
Matthew C Wiener
Pek Y Lum
Publication venue: Public Library of Science
Publication date: 01/04/2007
Field of study

To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Mapping the genetic architecture of gene expression in human liver

Author: Avila-Campillo I
Chudin E
Derry J
Drake TA
Guengerich FP
GuhaThakurta D
Hao K
Johnson JM
Kasarskis A
Kruger MJ
Lamb J
Lum PY
Lusis AJ
Mehrabian M
Millstein J
Molony C
Rohl CA
Rushmore TH
Schadt EE
Schuetz E
Sieberts S
Smith RC
Storey JD
Strom SC
Suver C
Ulrich R
Van Nas A
Wang S
Yang X
Zhang B
Zhu J
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/05/2008
Field of study

Genetic variants that are associated with common human diseases do not lead directly to disease, but instead act on intermediate, molecular phenotypes that in turn induce changes in higher-order disease traits. Therefore, identifying the molecular phenotypes that vary in response to changes in DNA and that also associate with changes in disease traits has the potential to provide the functional information required to not only identify and validate the susceptibility genes that are directly affected by changes in DNA, but also to understand the molecular networks in which such genes operate and how changes in these networks lead to changes in disease traits. Toward that end, we profiled more than 39,000 transcripts and we genotyped 782,476 unique single nucleotide polymorphisms (SNPs) in more than 400 human liver samples to characterize the genetic architecture of gene expression in the human liver, a metabolically active tissue that is important in a number of common human diseases, including obesity, diabetes, and atherosclerosis. This genome-wide association study of gene expression resulted in the detection of more than 6,000 associations between SNP genotypes and liver gene expression traits, where many of the corresponding genes identified have already been implicated in a number of human diseases. The utility of these data for elucidating the causes of common human diseases is demonstrated by integrating them with genotypic and expression data from other human and mouse populations. This provides much-needed functional support for the candidate susceptibility genes being identified at a growing number of genetic loci that have been identified as key drivers of disease from genome-wide association studies of disease. By using an integrative genomics approach, we highlight how the gene RPS26 and not ERBB3 is supported by our data as the most likely susceptibility gene for a novel type 1 diabetes locus recently identified in a large-scale, genome-wide association study. We also identify SORT1 and CELSR2 as candidate susceptibility genes for a locus recently associated with coronary artery disease and plasma low-density lipoprotein cholesterol levels in the process. © 2008 Schadt et al

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

D-Scholarship@Pitt

Utilization of landraces of European flint maize for breeding and genetic research

Author: Renner Juliane
Publication venue: Fakultät Agrarwissenschaften. Institut für Pflanzenzüchtung, Saatgutforschung und Populationsgenetik
Publication date: 01/01/2023
Field of study

Maize is one of the most important crops species for agriculture worldwide. Since its domestication, landraces formed the traditional type of variety. Selection and genetic factors formed a broad diversity of open-pollinated populations well adapted to local conditions. This changed with the introduction of hybrid breeding when nearly all existing landraces disappeared from their use in agriculture and as source material for breeding. Molecular analyses showed a narrow genetic base of the flint heterotic pool compared to the dent pool. Since genetic resources in maize are one of the richest of all major crops, the exploitation of this untapped reservoir of genetic variation in landraces could be an option to reverse the ongoing narrowing of the genetic basis to meet the demands of a growing world population as well as new challenges under a changing global climate and reduced inputs. The main goal of this study was the evaluation of European flint maize landraces to unlock their genetic diversity. In detail our objectives were to (i) determine the variation for testcross performance of European maize landraces; (ii) evaluate the phenotypic and genotypic variation of immortalized lines within and among landraces; (iii) compare the per se performance of those line libraries with elite lines as well as founder lines from the European flint germplasm pool; (iv) analyze the breeding potential of immortalized lines from landraces in comparison with elite material to improve the narrow genetic base of the flint heterotic pool; (v) demonstrate the high mapping resolution of DH libraries from landraces in association mapping down to causal variants and underlying genes; and (vi) provide conclusions and guidelines for breeding and research using libraries of immortalized lines from landraces. In a first experiment, we evaluated in multi-environment trials a broad collection of 70 European flint landraces for their testcross performance in combination with two elite dent testers. In comparison with the yield of modern hybrids, grain yield of the testcrosses of landraces was on average 26% lower, but a high genotypic variance among the landrace was observed for all traits and correlations were moderate to high for most trait combinations similar to those found in elite materials. Genetic correlations between the two testcross series exceeded 0.74 for all traits, suggesting that evaluation of testcross performance in combination with one or two single-cross tester(s) from the opposite pool is sufficient to assess the breeding potential of landraces. In a second experiment, we produced libraries of DH lines from the most promising landraces identified in the first experiment. In total 389 DH lines from six European flint landraces were evaluated together with four flint founder lines and 53 elite flint lines for 16 agronomic traits in four locations. In general, the genotypic variance (σ^2G) was larger within than among the DH libraries and exceeded also σ^2G of the elite flint lines. Furthermore, the means and σ^2G varied among the DH libraries resulting in large differences of the usefulness criterion. Mean grain yield of the elite flint lines exceeded that of the flint founder lines by 25% and DH libraries by 62%, indicating the impressive breeding progress achieved in the elite material and the substantial genetic load still present in the DH libraries. Nevertheless, the usefulness of the best DH lines was comparable to that of the elite flint lines for many traits including grain yield, underpinning the tremendous potential of landraces for broadening the genetic base of the elite germplasm. In a third experiment the materials from the 2nd experiment were genotyped with the MaizeSNP50 BeadChip from Illumina® and seeds of all genotypes were used for extracting and analyzing 288 metabolites with GC-MS. Data for agronomic traits and metabolites were used for a novel association mapping study. The much faster decay of linkage disequilibrium for adjacent markers in the DH libraries compared with the elite flint lines resulted in unprecedented map resolution. This was strikingly demonstrated by fine-mapping a QTL for oil content down to the phenylalanine insertion F469 in DGAT1-2 as the causal variant. Further, for the metabolite allantoin, which is related to abiotic stress response, promoter polymorphisms as well as differential expression of an allantoinase were identified as putative causes of variation despite a moderate size of the mapping population. These results are very encouraging to use DH libraries from landraces for association mapping and dissect QTL potentially down to the causal variants. However, larger population sizes of each DH library are recommended, similar to those commonly used with other approaches such as the NAM design, for detection of QTL explaining only a small portion of the genetic variance. This opens a new avenue for utilization of natural and/or engineered alleles in breeding. In conclusion, the genetic variation present in European flint maize landraces represents a unique source to reverse the ongoing narrowing of the genetic basis of the elite germplasm of this heterotic pool. For identifying the most promising landraces, we propose a multi-stage approach, where based on an assessment of the molecular diversity about one hundred landraces are evaluated in observation trials for agro-ecological adaptation and testcrosses with one single-cross tester are used for evaluating their general combining ability with the opposite heterotic pool. For a small number (< 6) of landraces a large number of DH lines are developed, which are phenotyped and genotyped for further use in association mapping and genomic selection with the ultimate goal to make these gold reserves accessible for maize breeding with modern approaches.Mais ist eine der wichtigsten Kulturarten für die Landwirtschaft weltweit. Seit seiner Domestikation bildeten Landrassen den traditionellen Sortentyp. Durch Selektion und genetische Faktoren entstand eine breite Diversität an panmiktisch vermehrten Populationen, die gut an lokale Bedingungen angepasst waren. Dies änderte sich mit der Einführung der Hybridzüchtung, als nahezu alle Landrassen in der landwirtschaftlichen Produktion und als Ausgangsmaterial für die Züchtung verschwanden. Molekulare Analysen zeigen eine enge genetische Basis des Flint Pools im Vergleich zum Dent Pool. Genetische Ressourcen im Mais gehören zu den umfangreichsten aller Nutzpflanzen. Die Nutzung dieses bislang ungenutzten Reservoirs an genetischer Diversität in Landrassen bietet eine Möglichkeit, um der fortschreitenden Einengung der genetischen Basis entgegenzuwirken und somit den Aufgaben der Pflanzenzüchtung im Hinblick auf eine wachsende Weltbevölkerung sowie den Herausforderungen des Klimawandels und reduzierten Inputs im Anbau gerecht zu werden. Übergeordnetes Ziel dieser Studie war die Evaluierung europäischer Flint-Mais Landrassen, um deren genetische Vielfalt nutzen zu können. Im Speziellen waren die Ziele (i) die Variation in Testkreuzungen europäischer Mais-Landrassen zu bestimmen; (ii) die phänotypische und genotypische Variation der Linien innerhalb und zwischen Landrassen zu beurteilen; (iii) die Eigenleistung dieser Linien mit Elite-Linien sowie Founder-Linien aus dem europäischen Flint-Pool zu vergleichen; (iv) das Potential von doppelhaploiden (DH) Linien aus Landrassen im Vergleich zum Elitematerial für die Züchtung zu analysieren, um die enge genetische Basis des Flint-Pools zu erweitern; (v) die Verwendung von DH-Bibliotheken aus Landrassen für die Assoziationskartierung bis hin zur Eingrenzung kausaler Gene zu demonstrieren; und (vi) Schlussfolgerungen und Leitlinien für die Züchtung und Forschung zu erörternum DH-Linien aus Landrassen nutzbar zu machen. In einem ersten Versuch wurde eine umfangreiche Kollektion von 70 europäischen Flint-Landrassen mehrortig in Kombination mit zwei Elite Dent-Testern auf ihre Testkreuzungsleistung hin untersucht. Verglichen mit dem Ertrag moderner Hybriden war der Kornertrag der Testkreuzungen der Landrassen im Durchschnitt um 26 % geringer, jedoch wurde eine hohe genotypische Varianz zwischen den Landrassen für alle Merkmale beobachtet. Die Korrelationen waren mittel bis hoch für die meisten Merkmalskombinationen und entsprachen denen im Elitezuchtmaterial. Die genetische Korrelation der beiden Testkreuzungsserien überstieg 0,74 für alle Merkmale. Dies zeigt, dass es ausreicht die Leistung von Testkreuzungen in Kombination mit einem oder zwei Testern - bestehend aus Einfachkreuzungen des anderen Gen-Pools zu bewerten, um das Potenzial von Landrassen für die Züchtung zu beurteilen. In einem zweiten Versuch produzierten wir Bibliotheken von DH-Linien der vielversprechendsten Landrassen des vorigen Versuches. Insgesamt wurden 389 DH-Linien aus sechs europäischer Flint Landrassen zusammen mit vier Flint Founder-Linien und 53 Elite Flintlinien auf 16 agronomische Merkmale an vier Standorten geprüft. Die genotypische Varianz (σ^2G) innerhalb der DH-Bibliotheken war größer als die zwischen den Bibliotheken und übertraf auch σ^2G der Elite Flintlinien. Darüber hinaus variierten die Mittelwerte und σ^2G zwischen den DH-Bibliotheken, was zu großen Unterschieden im Brauchbarkeits-Kriterium (usefulness) führte. Der mittlere Kornertrag der Elite Flintlinien übertraf den der Flint Founder-Linien um 25 % und der DH-Bibliotheken um 62 %, was auf den beträchtlichen Zuchtfortschritt im Elitematerial hinweist sowie auf die erhebliche genetische Bürde, welche in den DH-Bibliotheken vorliegt. Die Brauchbarkeit der besten DH-Linien war trotzdem für viele Merkmale, einschließlich dem Kornertrag, mit der von Elite Flintlinien vergleichbar. Dies zeigt das enorme Potenzial, Landrassen zur Verbreiterung des genetisch engen Elite Flint-Pools zu verwenden. In einem dritten Versuch wurden das genetische Material des vorherigen Versuches mit dem MaizeSNP50 BeadChip von Illumina® genotypisiert und Samen aller Genotypen zur Extraktion und Analyse von 288 Metaboliten mit GC-MS verwendet. Sowohl die agronomischen Merkmale als auch die Metabolit-Daten wurden für eine Assoziationskartierung verwendet. Der schnelle Abfall des Kopplungsungleichgewichts benachbarter Marker in den DH-Bibliotheken im Vergleich zu den Elite Flintlinien führte zu einer hervorragenden Auflösung in der QTL-Kartierung, was durch die Feinkartierung eines QTL (= quantitative trait locus) für Ölgehalt bis zur Phenylalanin Insertion F469 in DGAT1-2 als kausale Variante demonstriert werden konnte. Darüber hinaus wurden für den Metaboliten Allantoin, der im Zusammenhang mit abiotischem Stress steht, Promotorpolymorphismen sowie die Expression einer Allantoinase als vermutete Ursache der Variation identifiziert. Dies gelang trotz der moderaten Größe der Kartierungspopulation. Diese Ergebnisse sind ermutigend, um DH-Bibliotheken von Landrassen für die Assoziationskartierung zu verwenden und QTL bis auf die kausalen Varianten zu entschlüsseln. Eine Erweiterung der Populationsgrößen der DH-Bibliotheken, ähnlich wie sie in anderen Versuchsdesigns in der Literatur verwendet wurden, ist hierbei zu empfehlen, um mit diesem Ansatz QTL zu detektieren, welche lediglich einen kleinen Teil der genetischen Varianz erklären. Dies eröffnet neue Wege zur Nutzung natürlicher und/oder neu geschaffener Allele in der Züchtung. Zusammenfassend zeigen die Ergebnisse dieser Arbeit, dass die genetische Variation europäischer Landrassen bei Flint-Mais eine einzigartige Quelle darstellt, um die fortschreitende Verengung der genetischen Basis des Elitematerials in diesem Gen-Pool umzukehren. Um vielversprechende Landrassen zu identifizieren, schlagen wir folgenden zweistufigen Ansatz vor: (i) Basierend auf der Bewertung der molekularen Diversität werden etwa hundert Landrassen in Leistungsprüfungen auf ihre Anpassungsfähigkeit für die Zielregionen evaluiert und ihre Kombinationsfähigkeit mit dem entgegengesetzten heterotischen Gen-Pool in Testkreuzungen mit einer Einfachkreuzung als Tester bewertet. (ii) Für eine geringe Zahl (< 6) von Landrassen wird anschließend eine große Anzahl von DH-Linien erstellt, welche für die Nutzung in der Assoziationskartierung und/oder genomischen Selektion phänotypisiert und genotypisiert werden, um diese Goldreserven für die Maiszüchtung mit innovativen Methoden zugänglich zu machen

Elektronische Publikationen der Universität Hohenheim

Moving toward a system genetics view of disease

Author: A Ghazalpour
A Herbert
AC Cervino
AL Barabasi
AL Barabasi
AO Edwards
BE Stranger
C Chiellini
C Jiang
CL Karp
CM Kendziorski
D GuhaThakurta
DC Kulp
DD Shoemaker
E Petretto
E Petretto
E Ravasz
EE Schadt
EE Schadt
EE Schadt
EE Schadt
EJ Chesler
Eric E. Schadt
I Lee
J Klose
J Zhu
J Zhu
JD Han
JD Storey
JF Waring
JK Kim
JL Haines
JM Johnson
LJ van’t Veer
M Mehrabian
M Morley
MF Oleksiak
ML Peacock
N Friedman
N Hubner
PS Gargalovic
PY Lum
R Alberts
R DeCook
R Sladek
RB Brem
RB Brem
RC Jansen
RJ Klein
RJ Mural
S Doss
SA Monks
SF Grant
SI Lee
Solveig K. Sieberts
VG Cheung
VK Mootha
W Jin
ZB Zeng
Publication venue: Springer-Verlag
Publication date: 01/01/2007
Field of study

Testing hundreds of thousands of DNA markers in human, mouse, and other species for association to complex traits like disease is now a reality. However, information on how variations in DNA impact complex physiologic processes flows through transcriptional and other molecular networks. In other words, DNA variations impact complex diseases through the perturbations they cause to transcriptional and other biological networks, and these molecular phenotypes are intermediate to clinically defined disease. Because it is also now possible to monitor transcript levels in a comprehensive fashion, integrating DNA variation, transcription, and phenotypic data has the potential to enhance identification of the associations between DNA variation and diseases like obesity and diabetes, as well as characterize those parts of the molecular networks that drive these diseases. Toward that end, we review methods for integrating expression quantitative trait loci (eQTLs), gene expression, and clinical data to infer causal relationships among gene expression traits and between expression and clinical traits. We further describe methods to integrate these data in a more comprehensive manner by constructing coexpression gene networks that leverage pairwise gene interaction data to represent more general relationships. To infer gene networks that capture causal information, we describe a Bayesian algorithm that further integrates eQTLs, expression, and clinical phenotype data to reconstruct whole-gene networks capable of representing causal relationships among genes and traits in the network. These emerging network approaches, aimed at processing high-dimensional biological data by integrating data from multiple sources, represent some of the first steps in statistical genetics to identify multiple genetic perturbations that alter the states of molecular networks and that in turn push systems into disease states. Evolving statistical procedures that operate on networks will be critical to extracting information related to complex phenotypes like disease, as research goes beyond a single-gene focus. The early successes achieved with the methods described herein suggest that these more integrative genomics approaches to dissecting disease traits will significantly enhance the identification of key drivers of disease beyond what could be achieved by genetic association studies alone

Crossref

Springer - Publisher Connector

PubMed Central

Using Stochastic Causal Trees to Augment Bayesian Networks for Modeling eQTL Datasets

Author: AFM Smith
Ambuj K Singh
AS Dimas
AV Werhli
BE Stranger
BJ Chen
D Heckerman
D Husmeier
D Husmeier
D Madigan
DC Kulp
DJ Lockhart
DM Ruderfer
E Chaibub Neto
EE Schadt
EO Perlstein
GA Churchill
J Pearl
J Zhu
J Zhu
J Zhu
JD Storey
JJ Faith
JJ Keurentjes
Kyle C Chipman
M Ashburner
M Morley
M Schena
MH Kutner
N Bing
N Friedman
N Friedman
O Litvin
RB Brem
RB Brem
RC Jansen
RW Doerge
S Imoto
S Mukherjee
SI Lee
W Pan
W Zhang
W Zou
Y Benjamini
Z Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The combination of genotypic and genome-wide expression data arising from segregating populations offers an unprecedented opportunity to model and dissect complex phenotypes. The immense potential offered by these data derives from the fact that genotypic variation is the sole source of perturbation and can therefore be used to reconcile changes in gene expression programs with the parental genotypes. To date, several methodologies have been developed for modeling eQTL data. These methods generally leverage genotypic data to resolve causal relationships among gene pairs implicated as associates in the expression data. In particular, leading studies have augmented Bayesian networks with genotypic data, providing a powerful framework for learning and modeling causal relationships. While these initial efforts have provided promising results, one major drawback associated with these methods is that they are generally limited to resolving causal orderings for transcripts most proximal to the genomic loci. In this manuscript, we present a probabilistic method capable of learning the causal relationships between transcripts at all levels in the network. We use the information provided by our method as a prior for Bayesian network structure learning, resulting in enhanced performance for gene network reconstruction. Results Using established protocols to synthesize eQTL networks and corresponding data, we show that our method achieves improved performance over existing leading methods. For the goal of gene network reconstruction, our method achieves improvements in recall ranging from 20% to 90% across a broad range of precision levels and for datasets of varying sample sizes. Additionally, we show that the learned networks can be utilized for expression quantitative trait loci mapping, resulting in upwards of 10-fold increases in recall over traditional univariate mapping. Conclusions Using the information from our method as a prior for Bayesian network structure learning yields large improvements in accuracy for the tasks of gene network reconstruction and expression quantitative trait loci mapping. In particular, our method is effective for establishing causal relationships between transcripts located both proximally and distally from genomic loci.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Genetic Influences on Brain Gene Expression in Rats Selected for Tameness and Aggression

Author: Alexander Cagan
Belyaev
Darvasi
Enrico Petretto
Frank W. Albert
François Besnier
Henrike O. Heyne
Irina Z. Plyusnina
Leonid Kruglyak
Lyudmila Trut
Maxime Rotival
Rimma Kozhemyakina
Ronald Nelson
Susann Lautenschläger
Svante Pääbo
Torsten Schöneberg
Örjan Carlborg
Publication venue
Publication date: 01/01/2014
Field of study

Inter-individual differences in many behaviors are partly due to genetic differences, but the identification of the genes and variants that influence behavior remains challenging. Here, we studied an F2 intercross of two outbred lines of rats selected for tame and aggressive behavior towards humans for more than 64 generations. By using a mapping approach that is able to identify genetic loci segregating within the lines, we identified four times more loci influencing tameness and aggression than by an approach that assumes fixation of causative alleles, suggesting that many causative loci were not driven to fixation by the selection. We used RNA sequencing in 150 F2 animals to identify hundreds of loci that influence brain gene expression. Several of these loci colocalize with tameness loci and may reflect the same genetic variants. Through analyses of correlations between allele effects on behavior and gene expression, differential expression between the tame and aggressive rat selection lines, and correlations between gene expression and tameness in F2 animals, we identify the genes Gltscr2, Lgi4, Zfp40 and Slc17a7 as candidate contributors to the strikingly different behavior of the tame and aggressive animals

arXiv.org e-Print Archive

Crossref

PubMed Central

eScholarship - University of California

MPG.PuRe

Advances in Genetical Genomics of Plants

Author: Hilhorst H.W.M.
Joosen R.V.L.
Keurentjes J.J.B.
Ligterink W.
Publication venue: Bentham Science Publishers Ltd.
Publication date: 01/01/2009
Field of study

Natural variation provides a valuable resource to study the genetic regulation of quantitative traits. In quantitative trait locus (QTL) analyses this variation, captured in segregating mapping populations, is used to identify the genomic regions affecting these traits. The identification of the causal genes underlying QTLs is a major challenge for which the detection of gene expression differences is of major importance. By combining genetics with large scale expression profiling (i.e. genetical genomics), resulting in expression QTLs (eQTLs), great progress can be made in connecting phenotypic variation to genotypic diversity. In this review we discuss examples from human, mouse, Drosophila, yeast and plant research to illustrate the advances in genetical genomics, with a focus on understanding the regulatory mechanisms underlying natural variation. With their tolerance to inbreeding, short generation time and ease to generate large families, plants are ideal subjects to test new concepts in genetics. The comprehensive resources which are available for Arabidopsis make it a favorite model plant but genetical genomics also found its way to important crop species like rice, barley and wheat. We discuss eQTL profiling with respect to cis and trans regulation and show how combined studies with other ‘omics’ technologies, such as metabolomics and proteomics may further augment current information on transcriptional, translational and metabolomic signaling pathways and enable reconstruction of detailed regulatory networks. The fast developments in the ‘omics’ area will offer great potential for genetical genomics to elucidate the genotype-phenotype relationships for both fundamental and applied research

Crossref

PubMed Central

Wageningen University & Research Publications

High-Dimensional Bayesian Network Inference From Systems Genetics Data Using Genetic Node Ordering

Author: Albert
Albert
Bansal
Beckmann
Boyle
Bussemaker
Chen
Civelek
Cusanovich
Cusanovich
Delaneau
Emmert-Streib
Ernst
Franzén
Friedman
Friedman
Gerstein
Greenfield
Haeupler
Hageman
Hassin
Johnson
Kalisch
Kiani
Koller
Korte
Lappalainen
Lappalainen
Li
Luck
Marbach
Millstein
Millstein
Mukherjee
Neto
Neto
Ongen
Pearl
Penfold
Qi
Qi
Rockman
Schadt
Schadt
Schadt
Scutari
Scutari
Shabalin
Shojaie
Smith
Storey
Subramanian
Talukdar
Tasaki
Tibshirani
Walhout
Wang
Wang
Werhli
Zhang
Zhu
Zhu
Zhu
Zou
Äijö
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2019
Field of study

Studying the impact of genetic variation on gene regulatory networks is essential to understand the biological mechanisms by which genetic variation causes variation in phenotypes. Bayesian networks provide an elegant statistical approach for multi-trait genetic mapping and modelling causal trait relationships. However, inferring Bayesian gene networks from high-dimensional genetics and genomics data is challenging, because the number of possible networks scales super-exponentially with the number of nodes, and the computational cost of conventional Bayesian network inference methods quickly becomes prohibitive. We propose an alternative method to infer high-quality Bayesian gene networks that easily scales to thousands of genes. Our method first reconstructs a node ordering by conducting pairwise causal inference tests between genes, which then allows to infer a Bayesian network via a series of independent variable selection problems, one for each gene. We demonstrate using simulated and real systems genetics data that this results in a Bayesian network with equal, and sometimes better, likelihood than the conventional methods, while having a significantly higher overlap with groundtruth networks and being orders of magnitude faster. Moreover our method allows for a unified false discovery rate control across genes and individual edges, and thus a rigorous and easily interpretable way for tuning the sparsity level of the inferred network. Bayesian network inference using pairwise node ordering is a highly efficient approach for reconstructing gene regulatory networks when prior information for the inclusion of edges exists or can be inferred from the available data

University of Bergen

Crossref

Ghent University Academic Bibliography

Edinburgh Research Explorer

NORA - Norwegian Open Research Archives

Explaining additional genetic variation in complex traits

Author: Robinson Matthew R.
Visscher Peter M.
Wray Naomi R.
Publication venue: 'Elsevier BV'
Publication date: 01/04/2014
Field of study

Genome-wide association studies (GWAS) have provided valuable insights into the genetic basis of complex traits, discovering >6000 variants associated with >500 quantitative traits and common complex diseases in humans. The associations identified so far represent only a fraction of those that influence phenotype, because there are likely to be many variants across the entire frequency spectrum, each of which influences multiple traits, with only a small average contribution to the phenotypic variance. This presents a considerable challenge to further dissection of the remaining unexplained genetic variance within populations, which limits our ability to predict disease risk, identify new drug targets, improve and maintain food sources, and understand natural diversity. This challenge will be met within the current framework through larger sample size, better phenotyping, including recording of nongenetic risk factors, focused study designs, and an integration of multiple sources of phenotypic and genetic information. The current evidence supports the application of quantitative genetic approaches, and we argue that one should retain simpler theories until simplicity can be traded for greater explanatory power

University of Queensland eSpace