1,093 research outputs found
Bayesian Network Modeling and Inference of GWAS Catalog
Genome-wide association studies (GWASs) have received an increasing attention to understand genotype-phenotype relationships. The Bayesian network has been proposed as a powerful tool for modeling single-nucleotide polymorphism (SNP)-trait associations due to its advantage in addressing the high computational complex and high dimensional problems. Most current works learn the interactions among genotypes and phenotypes from the raw genotype data. However, due to the privacy issue, genotype information is sensitive and should be handled by complying with specific restrictions. In this work, we aim to build Bayesian networks from publicly released GWAS statistics to explicitly reveal the conditional dependency between SNPs and traits.
First, we focus on building a Bayesian network for modeling the SNP-categorical trait relationships. We construct a three-layered Bayesian network explicitly revealing the conditional dependency between SNPs and categorical traits from GWAS statistics. We then formulate inference problems based on the dependency relationship captured in the Bayesian network. Empirical evaluations show the effectiveness of our methods.
Second, we focus on modeling the SNP-quantitative trait relationships. Existing methods in the literature can only deal with categorical traits. We address this limitation by leveraging the Conditional Linear Gaussian (CLG) Bayesian network, which can handle a mixture of discrete and continuous variables. A two-layered CLG Bayesian network is built where the SNPs are represented as discrete variables in one layer and quantitative traits are represented as continuous variables in another layer. Efficient inference methods are then derived based on the constructed network. The experimental results demonstrate the effectiveness of our methods.
Finally, we present STIP, a web-based SNP-trait inference platform capable of a variety of inference tasks, such as trait inference given SNP genotypes and genotype inference given traits. The current version of STIP provides three services which are SNP-trait inference, Top-k trait prediction and GWAS catalog exploration
Statistical Methods in Integrative Genomics
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions
Routes for breaching and protecting genetic privacy
We are entering the era of ubiquitous genetic information for research,
clinical care, and personal curiosity. Sharing these datasets is vital for
rapid progress in understanding the genetic basis of human diseases. However,
one growing concern is the ability to protect the genetic privacy of the data
originators. Here, we technically map threats to genetic privacy and discuss
potential mitigation strategies for privacy-preserving dissemination of genetic
data.Comment: Draft for comment
Recommended from our members
Transcriptome-Wide Association Supplements Genome-Wide Association in Zea mays.
Modern improvement of complex traits in agricultural species relies on successful associations of heritable molecular variation with observable phenotypes. Historically, this pursuit has primarily been based on easily measurable genetic markers. The recent advent of new technologies allows assaying and quantifying biological intermediates (hereafter endophenotypes) which are now readily measurable at a large scale across diverse individuals. The usefulness of endophenotypes for delineating the regulatory landscape of the genome and genetic dissection of complex trait variation remains underexplored in plants. The work presented here illustrated the utility of a large-scale (299-genotype and seven-tissue) gene expression resource to dissect traits across multiple levels of biological organization. Using single-tissue- and multi-tissue-based transcriptome-wide association studies (TWAS), we revealed that about half of the functional variation acts through altered transcript abundance for maize kernel traits, including 30 grain carotenoid abundance traits, 20 grain tocochromanol abundance traits, and 22 field-measured agronomic traits. Comparing the efficacy of TWAS with genome-wide association studies (GWAS) and an ensemble approach that combines both GWAS and TWAS, we demonstrated that results of TWAS in combination with GWAS increase the power to detect known genes and aid in prioritizing likely causal genes. Using a variance partitioning approach in the largely independent maize Nested Association Mapping (NAM) population, we also showed that the most strongly associated genes identified by combining GWAS and TWAS explain more heritable variance for a majority of traits than the heritability captured by the random genes and the genes identified by GWAS or TWAS alone. This not only improves the ability to link genes to phenotypes, but also highlights the phenotypic consequences of regulatory variation in plants
The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/154448/1/sim8445_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/154448/2/sim8445.pd
Network reconstruction for trans acting genetic loci using multi-omics data and prior information
BACKGROUND: Molecular measurements of the genome, the transcriptome, and the epigenome, often termed multi-omics data, provide an in-depth view on biological systems and their integration is crucial for gaining insights in complex regulatory processes. These data can be used to explain disease related genetic variants by linking them to intermediate molecular traits (quantitative trait loci, QTL). Molecular networks regulating cellular processes leave footprints in QTL results as so-called trans-QTL hotspots. Reconstructing these networks is a complex endeavor and use of biological prior information can improve network inference. However, previous efforts were limited in the types of priors used or have only been applied to model systems. In this study, we reconstruct the regulatory networks underlying trans-QTL hotspots using human cohort data and data-driven prior information. METHODS: We devised a new strategy to integrate QTL with human population scale multi-omics data. State-of-the art network inference methods including BDgraph and glasso were applied to these data. Comprehensive prior information to guide network inference was manually curated from large-scale biological databases. The inference approach was extensively benchmarked using simulated data and cross-cohort replication analyses. Best performing methods were subsequently applied to real-world human cohort data. RESULTS: Our benchmarks showed that prior-based strategies outperform methods without prior information in simulated data and show better replication across datasets. Application of our approach to human cohort data highlighted two novel regulatory networks related to schizophrenia and lean body mass for which we generated novel functional hypotheses. CONCLUSIONS: We demonstrate that existing biological knowledge can improve the integrative analysis of networks underlying trans associations and generate novel hypotheses about regulatory mechanisms
- âŠ