1,530 research outputs found

    A Bayesian method for evaluating and discovering disease loci associations

    Get PDF
    Background: A genome-wide association study (GWAS) typically involves examining representative SNPs in individuals from some population. A GWAS data set can concern a million SNPs and may soon concern billions. Researchers investigate the association of each SNP individually with a disease, and it is becoming increasingly commonplace to also analyze multi-SNP associations. Techniques for handling so many hypotheses include the Bonferroni correction and recently developed Bayesian methods. These methods can encounter problems. Most importantly, they are not applicable to a complex multi-locus hypothesis which has several competing hypotheses rather than only a null hypothesis. A method that computes the posterior probability of complex hypotheses is a pressing need. Methodology/Findings: We introduce the Bayesian network posterior probability (BNPP) method which addresses the difficulties. The method represents the relationship between a disease and SNPs using a directed acyclic graph (DAG) model, and computes the likelihood of such models using a Bayesian network scoring criterion. The posterior probability of a hypothesis is computed based on the likelihoods of all competing hypotheses. The BNPP can not only be used to evaluate a hypothesis that has previously been discovered or suspected, but also to discover new disease loci associations. The results of experiments using simulated and real data sets are presented. Our results concerning simulated data sets indicate that the BNPP exhibits both better evaluation and discovery performance than does a p-value based method. For the real data sets, previous findings in the literature are confirmed and additional findings are found. Conclusions/Significance: We conclude that the BNPP resolves a pressing problem by providing a way to compute the posterior probability of complex multi-locus hypotheses. A researcher can use the BNPP to determine the expected utility of investigating a hypothesis further. Furthermore, we conclude that the BNPP is a promising method for discovering disease loci associations. © 2011 Jiang et al

    2D association and integrative omics analysis in rice provides systems biology view in trait analysis.

    Get PDF
    The interactions among genes and between genes and environment contribute significantly to the phenotypic variation of complex traits and may be possible explanations for missing heritability. However, to our knowledge no existing tool can address the two kinds of interactions. Here we propose a novel linear mixed model that considers not only the additive effects of biological markers but also the interaction effects of marker pairs. Interaction effect is demonstrated as a 2D association. Based on this linear mixed model, we developed a pipeline, namely PATOWAS. PATOWAS can be used to study transcriptome-wide and metabolome-wide associations in addition to genome-wide associations. Our case analysis with real rice recombinant inbred lines (RILs) at three omics levels demonstrates that 2D association mapping and integrative omics are able to provide a systems biology view into the analyzed traits, leading toward an answer about how genes, transcripts, proteins, and metabolites work together to produce an observable phenotype

    A Drive to Driven Model of Mapping Intraspecific Interaction Networks.

    Get PDF
    Community ecology theory suggests that an individual\u27s phenotype is determined by the phenotypes of its coexisting members to the extent at which this process can shape community evolution. Here, we develop a mapping theory to identify interaction quantitative trait loci (QTL) governing inter-individual dependence. We mathematically formulate the decision-making strategy of interacting individuals. We integrate these mathematical descriptors into a statistical procedure, enabling the joint characterization of how QTL drive the strengths of ecological interactions and how the genetic architecture of QTL is driven by ecological networks. In three fish full-sib mapping experiments, we identify a set of genome-wide QTL that control a range of societal behaviors, including mutualism, altruism, aggression, and antagonism, and find that these intraspecific interactions increase the genetic variation of body mass by about 50%. We showcase how the interaction QTL can be used as editors to reconstruct and engineer new social networks for ecological communities

    Genome-driven evolutionary game theory helps understand the rise of metabolic interdependencies in microbial communities

    Get PDF
    Metabolite exchanges in microbial communities give rise to ecological interactions that govern ecosystem diversity and stability. It is unclear, however, how the rise of these interactions varies across metabolites and organisms. Here we address this question by integrating genome-scale models of metabolism with evolutionary game theory. Specifically, we use microbial fitness values estimated by metabolic models to infer evolutionarily stable interactions in multi-species microbial “games”. We first validate our approach using a well-characterized yeast cheater-cooperator system. We next perform over 80,000 in silico experiments to infer how metabolic interdependencies mediated by amino acid leakage in Escherichia coli vary across 189 amino acid pairs. While most pairs display shared patterns of inter-species interactions, multiple deviations are caused by pleiotropy and epistasis in metabolism. Furthermore, simulated invasion experiments reveal possible paths to obligate cross-feeding. Our study provides genomically driven insight into the rise of ecological interactions, with implications for microbiome research and synthetic ecology.We gratefully acknowledge funding from the Defense Advanced Research Projects Agency (Purchase Request No. HR0011515303, Contract No. HR0011-15-C-0091), the U.S. Department of Energy (Grants DE-SC0004962 and DE-SC0012627), the NIH (Grants 5R01DE024468 and R01GM121950), the national Science Foundation (Grants 1457695 and NSFOCE-BSF 1635070), MURI Grant W911NF-12-1-0390, the Human Frontiers Science Program (grant RGP0020/2016), and the Boston University Interdisciplinary Biomedical Research Office ARC grant on Systems Biology Approaches to Microbiome Research. We also thank Dr Kirill Korolev and members of the Segre Lab for their invaluable feedback on this work. (HR0011515303 - Defense Advanced Research Projects Agency; HR0011-15-C-0091 - Defense Advanced Research Projects Agency; DE-SC0004962 - U.S. Department of Energy; DE-SC0012627 - U.S. Department of Energy; 5R01DE024468 - NIH; R01GM121950 - NIH; 1457695 - national Science Foundation; NSFOCE-BSF 1635070 - national Science Foundation; W911NF-12-1-0390 - MURI; RGP0020/2016 - Human Frontiers Science Program; Boston University Interdisciplinary Biomedical Research Office ARC)Published versio

    Multifactor Dimensionality Reduction as a Filter-Based Approach for Genome Wide Association Studies

    Get PDF
    Advances in genotyping technology and the multitude of genetic data available now provide a vast amount of data that is proving to be useful in the quest for a better understanding of human genetic diseases through the study of genetic variation. This has led to the development of approaches such as genome wide association studies (GWAS) designed specifically for interrogating variants across the genome for association with disease, typically by testing single locus, univariate associations. More recently it has been accepted that epistatic (interaction) effects may also be great contributors to these genetic effects, and GWAS methods are now being applied to find epistatic effects. The challenge for these methods still remain in prioritization and interpretation of results, as it has also become standard for initial findings to be independently investigated in replication cohorts or functional studies. This is motivating the development and implementation of filter-based approaches to prioritize variants found to be significant in a discovery stage for follow-up for replication. Such filters must be able to detect both univariate and interactive effects. In the current study we present and evaluate the use of multifactor dimensionality reduction (MDR) as such a filter, with simulated data and a wide range of effect sizes. Additionally, we compare the performance of the MDR filter to a similar filter approach using logistic regression (LR), the more traditional approach used in GWAS analysis, as well as evaporative cooling (EC)-another prominent machine learning filtering method. The results of our simulation study show that MDR is an effective method for such prioritization, and that it can detect main effects, and interactions with or without marginal effects. Importantly, it performed as well as EC and LR for main effect models. It also significantly outperforms LR for various two-locus epistatic models, while it has equivalent results as EC for the epistatic models. The results of this study demonstrate the potential of MDR as a filter to detect gene–gene interactions in GWAS studies

    Grammatical evolution decision trees for detecting gene-gene interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A fundamental goal of human genetics is the discovery of polymorphisms that predict common, complex diseases. It is hypothesized that complex diseases are due to a myriad of factors including environmental exposures and complex genetic risk models, including gene-gene interactions. Such epistatic models present an important analytical challenge, requiring that methods perform not only statistical modeling, but also variable selection to generate testable genetic model hypotheses. This challenge is amplified by recent advances in genotyping technology, as the number of potential predictor variables is rapidly increasing.</p> <p>Methods</p> <p>Decision trees are a highly successful, easily interpretable data-mining method that are typically optimized with a hierarchical model building approach, which limits their potential to identify interacting effects. To overcome this limitation, we utilize evolutionary computation, specifically grammatical evolution, to build decision trees to detect and model gene-gene interactions. In the current study, we introduce the Grammatical Evolution Decision Trees (GEDT) method and software and evaluate this approach on simulated data representing gene-gene interaction models of a range of effect sizes. We compare the performance of the method to a traditional decision tree algorithm and a random search approach and demonstrate the improved performance of the method to detect purely epistatic interactions.</p> <p>Results</p> <p>The results of our simulations demonstrate that GEDT has high power to detect even very moderate genetic risk models. GEDT has high power to detect interactions with and without main effects.</p> <p>Conclusions</p> <p>GEDT, while still in its initial stages of development, is a promising new approach for identifying gene-gene interactions in genetic association studies.</p

    Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis.

    Get PDF
    Recent advances in the scale and diversity of population genomic datasets for bacteria now provide the potential for genome-wide patterns of co-evolution to be studied at the resolution of individual bases. Here we describe a new statistical method, genomeDCA, which uses recent advances in computational structural biology to identify the polymorphic loci under the strongest co-evolutionary pressures. We apply genomeDCA to two large population data sets representing the major human pathogens Streptococcus pneumoniae (pneumococcus) and Streptococcus pyogenes (group A Streptococcus). For pneumococcus we identified 5,199 putative epistatic interactions between 1,936 sites. Over three-quarters of the links were between sites within the pbp2x, pbp1a and pbp2b genes, the sequences of which are critical in determining non-susceptibility to beta-lactam antibiotics. A network-based analysis found these genes were also coupled to that encoding dihydrofolate reductase, changes to which underlie trimethoprim resistance. Distinct from these antibiotic resistance genes, a large network component of 384 protein coding sequences encompassed many genes critical in basic cellular functions, while another distinct component included genes associated with virulence. The group A Streptococcus (GAS) data set population represents a clonal population with relatively little genetic variation and a high level of linkage disequilibrium across the genome. Despite this, we were able to pinpoint two RNA pseudouridine synthases, which were each strongly linked to a separate set of loci across the chromosome, representing biologically plausible targets of co-selection. The population genomic analysis method applied here identifies statistically significantly co-evolving locus pairs, potentially arising from fitness selection interdependence reflecting underlying protein-protein interactions, or genes whose product activities contribute to the same phenotype. This discovery approach greatly enhances the future potential of epistasis analysis for systems biology, and can complement genome-wide association studies as a means of formulating hypotheses for targeted experimental work

    Six Degrees of Epistasis: Statistical Network Models for GWAS

    Get PDF
    There is growing evidence that much more of the genome than previously thought is required to explain the heritability of complex phenotypes. Recent studies have demonstrated that numerous common variants from across the genome explain portions of genetic variability, spawning various avenues of research directed at explaining the remaining heritability. This polygenic structure is also the motivation for the growing application of pathway and gene set enrichment techniques, which have yielded promising results. These findings suggest that the coordination of genes in pathways that are known to occur at the gene regulatory level also can be detected at the population level. Although genes in these networks interact in complex ways, most population studies have focused on the additive contribution of common variants and the potential of rare variants to explain additional variation. In this brief review, we discuss the potential to explain additional genetic variation through the agglomeration of multiple gene–gene interactions as well as main effects of common variants in terms of a network paradigm. Just as is the case for single-locus contributions, we expect each gene–gene interaction edge in the network to have a small effect, but these effects may be reinforced through hubs and other connectivity structures in the network. We discuss some of the opportunities and challenges of network methods for analyzing genome-wide association studies (GWAS) such as the study of hubs and motifs, and integrating other types of variation and environmental interactions. Such network approaches may unveil hidden variation in GWAS, improve understanding of mechanisms of disease, and possibly fit into a network paradigm of evolutionary genetics
    corecore