108 research outputs found

    Moving toward a system genetics view of disease

    Get PDF
    Testing hundreds of thousands of DNA markers in human, mouse, and other species for association to complex traits like disease is now a reality. However, information on how variations in DNA impact complex physiologic processes flows through transcriptional and other molecular networks. In other words, DNA variations impact complex diseases through the perturbations they cause to transcriptional and other biological networks, and these molecular phenotypes are intermediate to clinically defined disease. Because it is also now possible to monitor transcript levels in a comprehensive fashion, integrating DNA variation, transcription, and phenotypic data has the potential to enhance identification of the associations between DNA variation and diseases like obesity and diabetes, as well as characterize those parts of the molecular networks that drive these diseases. Toward that end, we review methods for integrating expression quantitative trait loci (eQTLs), gene expression, and clinical data to infer causal relationships among gene expression traits and between expression and clinical traits. We further describe methods to integrate these data in a more comprehensive manner by constructing coexpression gene networks that leverage pairwise gene interaction data to represent more general relationships. To infer gene networks that capture causal information, we describe a Bayesian algorithm that further integrates eQTLs, expression, and clinical phenotype data to reconstruct whole-gene networks capable of representing causal relationships among genes and traits in the network. These emerging network approaches, aimed at processing high-dimensional biological data by integrating data from multiple sources, represent some of the first steps in statistical genetics to identify multiple genetic perturbations that alter the states of molecular networks and that in turn push systems into disease states. Evolving statistical procedures that operate on networks will be critical to extracting information related to complex phenotypes like disease, as research goes beyond a single-gene focus. The early successes achieved with the methods described herein suggest that these more integrative genomics approaches to dissecting disease traits will significantly enhance the identification of key drivers of disease beyond what could be achieved by genetic association studies alone

    Cell-to-Cell Stochastic Variation in Gene Expression Is a Complex Genetic Trait

    Get PDF
    The genetic control of common traits is rarely deterministic, with many genes contributing only to the chance of developing a given phenotype. This incomplete penetrance is poorly understood and is usually attributed to interactions between genes or interactions between genes and environmental conditions. Because many traits such as cancer can emerge from rare events happening in one or very few cells, we speculate an alternative and complementary possibility where some genotypes could facilitate these events by increasing stochastic cell-to-cell variations (or ‘noise’). As a very first step towards investigating this possibility, we studied how natural genetic variation influences the level of noise in the expression of a single gene using the yeast S. cerevisiae as a model system. Reproducible differences in noise were observed between divergent genetic backgrounds. We found that noise was highly heritable and placed under a complex genetic control. Scanning the genome, we mapped three Quantitative Trait Loci (QTL) of noise, one locus being explained by an increase in noise when transcriptional elongation was impaired. Our results suggest that the level of stochasticity in particular molecular regulations may differ between multicellular individuals depending on their genotypic background. The complex genetic architecture of noise buffering couples genetic to non-genetic robustness and provides a molecular basis to the probabilistic nature of complex traits

    High-Resolution Mapping of Gene Expression Using Association in an Outbred Mouse Stock

    Get PDF
    Quantitative trait locus (QTL) analysis is a powerful tool for mapping genes for complex traits in mice, but its utility is limited by poor resolution. A promising mapping approach is association analysis in outbred stocks or different inbred strains. As a proof of concept for the association approach, we applied whole-genome association analysis to hepatic gene expression traits in an outbred mouse population, the MF1 stock, and replicated expression QTL (eQTL) identified in previous studies of F2 intercross mice. We found that the mapping resolution of these eQTL was significantly greater in the outbred population. Through an example, we also showed how this precise mapping can be used to resolve previously identified loci (in intercross studies), which affect many different transcript levels (known as eQTL “hotspots”), into distinct regions. Our results also highlight the importance of correcting for population structure in whole-genome association studies in the outbred stock

    Learning a Prior on Regulatory Potential from eQTL Data

    Get PDF
    Genome-wide RNA expression data provide a detailed view of an organism's biological state; hence, a dataset measuring expression variation between genetically diverse individuals (eQTL data) may provide important insights into the genetics of complex traits. However, with data from a relatively small number of individuals, it is difficult to distinguish true causal polymorphisms from the large number of possibilities. The problem is particularly challenging in populations with significant linkage disequilibrium, where traits are often linked to large chromosomal regions containing many genes. Here, we present a novel method, Lirnet, that automatically learns a regulatory potential for each sequence polymorphism, estimating how likely it is to have a significant effect on gene expression. This regulatory potential is defined in terms of “regulatory features”—including the function of the gene and the conservation, type, and position of genetic polymorphisms—that are available for any organism. The extent to which the different features influence the regulatory potential is learned automatically, making Lirnet readily applicable to different datasets, organisms, and feature sets. We apply Lirnet both to the human HapMap eQTL dataset and to a yeast eQTL dataset and provide statistical and biological results demonstrating that Lirnet produces significantly better regulatory programs than other recent approaches. We demonstrate in the yeast data that Lirnet can correctly suggest a specific causal sequence variation within a large, linked chromosomal region. In one example, Lirnet uncovered a novel, experimentally validated connection between Puf3—a sequence-specific RNA binding protein—and P-bodies—cytoplasmic structures that regulate translation and RNA stability—as well as the particular causative polymorphism, a SNP in Mkt1, that induces the variation in the pathway

    Using Stochastic Causal Trees to Augment Bayesian Networks for Modeling eQTL Datasets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The combination of genotypic and genome-wide expression data arising from segregating populations offers an unprecedented opportunity to model and dissect complex phenotypes. The immense potential offered by these data derives from the fact that genotypic variation is the sole source of perturbation and can therefore be used to reconcile changes in gene expression programs with the parental genotypes. To date, several methodologies have been developed for modeling eQTL data. These methods generally leverage genotypic data to resolve causal relationships among gene pairs implicated as associates in the expression data. In particular, leading studies have augmented Bayesian networks with genotypic data, providing a powerful framework for learning and modeling causal relationships. While these initial efforts have provided promising results, one major drawback associated with these methods is that they are generally limited to resolving causal orderings for transcripts most proximal to the genomic loci. In this manuscript, we present a probabilistic method capable of learning the causal relationships between transcripts at all levels in the network. We use the information provided by our method as a prior for Bayesian network structure learning, resulting in enhanced performance for gene network reconstruction.</p> <p>Results</p> <p>Using established protocols to synthesize eQTL networks and corresponding data, we show that our method achieves improved performance over existing leading methods. For the goal of gene network reconstruction, our method achieves improvements in recall ranging from 20% to 90% across a broad range of precision levels and for datasets of varying sample sizes. Additionally, we show that the learned networks can be utilized for expression quantitative trait loci mapping, resulting in upwards of 10-fold increases in recall over traditional univariate mapping.</p> <p>Conclusions</p> <p>Using the information from our method as a prior for Bayesian network structure learning yields large improvements in accuracy for the tasks of gene network reconstruction and expression quantitative trait loci mapping. In particular, our method is effective for establishing causal relationships between transcripts located both proximally and distally from genomic loci.</p

    Population Differences in Transcript-Regulator Expression Quantitative Trait Loci

    Get PDF
    Gene expression quantitative trait loci (eQTL) are useful for identifying single nucleotide polymorphisms (SNPs) associated with diseases. At times, a genetic variant may be associated with a master regulator involved in the manifestation of a disease. The downstream target genes of the master regulator are typically co-expressed and share biological function. Therefore, it is practical to screen for eQTLs by identifying SNPs associated with the targets of a transcript-regulator (TR). We used a multivariate regression with the gene expression of known targets of TRs and SNPs to identify TReQTLs in European (CEU) and African (YRI) HapMap populations. A nominal p-value of <1×10−6 revealed 234 SNPs in CEU and 154 in YRI as TReQTLs. These represent 36 independent (tag) SNPs in CEU and 39 in YRI affecting the downstream targets of 25 and 36 TRs respectively. At a false discovery rate (FDR) = 45%, one cis-acting tag SNP (within 1 kb of a gene) in each population was identified as a TReQTL. In CEU, the SNP (rs16858621) in Pcnxl2 was found to be associated with the genes regulated by CREM whereas in YRI, the SNP (rs16909324) was linked to the targets of miRNA hsa-miR-125a. To infer the pathways that regulate expression, we ranked TReQTLs by connectivity within the structure of biological process subtrees. One TReQTL SNP (rs3790904) in CEU maps to Lphn2 and is associated (nominal p-value = 8.1×10−7) with the targets of the X-linked breast cancer suppressor Foxp3. The structure of the biological process subtree and a gene interaction network of the TReQTL revealed that tumor necrosis factor, NF-kappaB and variants in G-protein coupled receptors signaling may play a central role as communicators in Foxp3 functional regulation. The potential pleiotropic effect of the Foxp3 TReQTLs was gleaned from integrating mRNA-Seq data and SNP-set enrichment into the analysis

    What Can Causal Networks Tell Us about Metabolic Pathways?

    Get PDF
    Graphical models describe the linear correlation structure of data and have been used to establish causal relationships among phenotypes in genetic mapping populations. Data are typically collected at a single point in time. Biological processes on the other hand are often non-linear and display time varying dynamics. The extent to which graphical models can recapitulate the architecture of an underlying biological processes is not well understood. We consider metabolic networks with known stoichiometry to address the fundamental question: “What can causal networks tell us about metabolic pathways?”. Using data from an Arabidopsis BaySha population and simulated data from dynamic models of pathway motifs, we assess our ability to reconstruct metabolic pathways using graphical models. Our results highlight the necessity of non-genetic residual biological variation for reliable inference. Recovery of the ordering within a pathway is possible, but should not be expected. Causal inference is sensitive to subtle patterns in the correlation structure that may be driven by a variety of factors, which may not emphasize the substrate-product relationship. We illustrate the effects of metabolic pathway architecture, epistasis and stochastic variation on correlation structure and graphical model-derived networks. We conclude that graphical models should be interpreted cautiously, especially if the implied causal relationships are to be used in the design of intervention strategies

    Model Selection Approach Suggests Causal Association between 25-Hydroxyvitamin D and Colorectal Cancer

    Get PDF
    Vitamin D deficiency has been associated with increased risk of colorectal cancer (CRC), but causal relationship has not yet been confirmed. We investigate the direction of causation between vitamin D and CRC by extending the conventional approaches to allow pleiotropic relationships and by explicitly modelling unmeasured confounders.Plasma 25-hydroxyvitamin D (25-OHD), genetic variants associated with 25-OHD and CRC, and other relevant information was available for 2645 individuals (1057 CRC cases and 1588 controls) and included in the model. We investigate whether 25-OHD is likely to be causally associated with CRC, or vice versa, by selecting the best modelling hypothesis according to Bayesian predictive scores. We examine consistency for a range of prior assumptions.Model comparison showed preference for the causal association between low 25-OHD and CRC over the reverse causal hypothesis. This was confirmed for posterior mean deviances obtained for both models (11.5 natural log units in favour of the causal model), and also for deviance information criteria (DIC) computed for a range of prior distributions. Overall, models ignoring hidden confounding or pleiotropy had significantly poorer DIC scores.Results suggest causal association between 25-OHD and colorectal cancer, and support the need for randomised clinical trials for further confirmations

    A web-based Alcohol Clinical Training (ACT) curriculum: Is in-person faculty development necessary to affect teaching?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Physicians receive little education about unhealthy alcohol use and as a result patients often do not receive efficacious interventions. The objective of this study is to evaluate whether a free web-based alcohol curriculum would be used by physician educators and whether in-person faculty development would increase its use, confidence in teaching and teaching itself.</p> <p>Methods</p> <p>Subjects were physician educators who applied to attend a workshop on the use of a web-based curriculum about alcohol screening and brief intervention and cross-cultural efficacy. All physicians were provided the curriculum web address. Intervention subjects attended a 3-hour workshop including demonstration of the website, modeling of teaching, and development of a plan for using the curriculum. All subjects completed a survey prior to and 3 months after the workshop.</p> <p>Results</p> <p>Of 20 intervention and 13 control subjects, 19 (95%) and 10 (77%), respectively, completed follow-up. Compared to controls, intervention subjects had greater increases in confidence in teaching alcohol screening, and in the frequency of two teaching practices – teaching about screening and eliciting patient health beliefs. Teaching confidence and teaching practices improved significantly in 9 of 10 comparisons for intervention, and in 0 comparisons for control subjects. At follow-up 79% of intervention but only 50% of control subjects reported using any part of the curriculum (p = 0.20).</p> <p>Conclusion</p> <p>In-person training for physician educators on the use of a web-based alcohol curriculum can increase teaching confidence and practices. Although the web is frequently used for disemination, in-person training may be preferable to effect widespread teaching of clinical skills like alcohol screening and brief intervention.</p

    Meta-Analysis of Genome-Wide Association Studies for Abdominal Aortic Aneurysm Identifies Four New Disease-Specific Risk Loci

    Get PDF
    Rationale: Abdominal aortic aneurysm (AAA) is a complex disease with both genetic and environmental risk factors. Together, 6 previously identified risk loci only explain a small proportion of the heritability of AAA. Objective: To identify additional AAA risk loci using data from all available genome-wide association studies (GWAS). Methods and Results: Through a meta-analysis of 6 GWAS datasets and a validation study totalling 10,204 cases and 107,766 controls we identified 4 new AAA risk loci: 1q32.3 (SMYD2), 13q12.11 (LINC00540), 20q13.12 (near PCIF1/MMP9/ZNF335), and 21q22.2 (ERG). In various database searches we observed no new associations between the lead AAA SNPs and coronary artery disease, blood pressure, lipids or diabetes. Network analyses identified ERG, IL6R and LDLR as modifiers of MMP9, with a direct interaction between ERG and MMP9. Conclusions: The 4 new risk loci for AAA appear to be specific for AAA compared with other cardiovascular diseases and related traits suggesting that traditional cardiovascular risk factor management may only have limited value in preventing the progression of aneurysmal disease
    corecore