154 research outputs found
Mediation Analysis Demonstrates That Trans-eQTLs Are Often Explained by Cis-Mediation:A Genome-Wide Analysis among 1,800 South Asians
A large fraction of human genes are regulated by genetic variation near the transcribed sequence (cis-eQTL, expression quantitative trait locus), and many cis-eQTLs have implications for human disease. Less is known regarding the effects of genetic variation on expression of distant genes (trans-eQTLs) and their biological mechanisms. In this work, we use genome-wide data on SNPs and array-based expression measures from mononuclear cells obtained from a population-based cohort of 1,799 Bangladeshi individuals to characterize cis- and trans-eQTLs and determine if observed trans-eQTL associations are mediated by expression of transcripts in cis with the SNPs showing trans-association, using Sobel tests of mediation. We observed 434 independent trans-eQTL associations at a false-discovery rate of 0.05, and 189 of these transeQTLs were also cis-eQTLs (enrichment P</p
Tejaas: reverse regression increases power for detecting trans-eQTLs
Trans-acting expression quantitative trait loci (trans-eQTLs) account for ≥70% expression heritability and could therefore facilitate uncovering mechanisms underlying the origination of complex diseases. Identifying trans-eQTLs is challenging because of small effect sizes, tissue specificity, and a severe multiple-testing burden. Tejaas predicts trans-eQTLs by performing L2-regularized “reverse” multiple regression of each SNP on all genes, aggregating evidence from many small trans-effects while being unaffected by the strong expression correlations. Combined with a novel unsupervised k-nearest neighbor method to remove confounders, Tejaas predicts 18851 unique trans-eQTLs across 49 tissues from GTEx. They are enriched in open chromatin, enhancers, and other regulatory regions. Many overlap with disease-associated SNPs, pointing to tissue-specific transcriptional regulation mechanisms.Fil: Banerjee, Saikat. Max Planck Institute For Biophysical Chemistry; AlemaniaFil: Simonetti, Franco Lucio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentina. Max Planck Institute For Biophysical Chemistry; AlemaniaFil: Detrois, Kira E.. Max Planck Institute For Biophysical Chemistry; Alemania. Universität Göttingen; AlemaniaFil: Kaphle, Anubhav. Universität Göttingen; Alemania. Max Planck Institute For Biophysical Chemistry; AlemaniaFil: Mitra, Raktim. Indian Institute of Technology; IndiaFil: Nagial, Rahul. Indian Institute of Technology; IndiaFil: Söding, Johannes. Max Planck Institute For Biophysical Chemistry; Alemania. University of Göttingen; Alemani
Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast
Under embargo until: 2021-12-17Causal gene networks model the flow of information within a cell. Reconstructing causal networks from omics data is challenging because correlation does not imply causation. When genomics and transcriptomics data from a segregating population are combined, genomic variants can be used to orient the direction of causality between gene expression traits. Instrumental variable methods use a local expression quantitative trait locus (eQTL) as a randomized instrument for a gene's expression level, and assign target genes based on distal eQTL associations. Mediation-based methods additionally require that distal eQTL associations are mediated by the source gene. A detailed comparison between these methods has not yet been conducted, due to the lack of a standardized implementation of different methods, the limited sample size of most multi-omics datasets, and the absence of ground-truth networks for most organisms. Here we used Findr, a software package providing uniform implementations of instrumental variable, mediation, and coexpression-based methods, a recent dataset of 1012 segregants from a cross between two budding yeast strains, and the YEASTRACT database of known transcriptional interactions to compare causal gene network inference methods. We found that causal inference methods result in a significant overlap with the ground-truth, whereas coexpression did not perform better than random. A subsampling analysis revealed that the performance of mediation saturates at large sample sizes, due to a loss of sensitivity when residual correlations become significant. Instrumental variable methods on the other hand contain false positive predictions, due to genomic linkage between eQTL instruments. Instrumental variable and mediation-based methods also have complementary roles for identifying causal genes underlying transcriptional hotspots. Instrumental variable methods correctly predicted STB5 targets for a hotspot centred on the transcription factor STB5, whereas mediation failed due to Stb5p auto-regulating its own expression. Mediation suggests a new candidate gene, DNM1, for a hotspot on Chr XII, whereas instrumental variable methods could not distinguish between multiple genes located within the hotspot. In conclusion, causal inference from genomics and transcriptomics data is a powerful approach for reconstructing causal gene networks, which could be further improved by the development of methods to control for residual correlations in mediation analyses, and for genomic linkage and pleiotropic effects from transcriptional hotspots in instrumental variable analyses.acceptedVersio
MOSTWAS: Multi-Omic Strategies for Transcriptome-Wide Association Studies
Traditional predictive models for transcriptome-wide association studies (TWAS) consider only single nucleotide polymorphisms (SNPs) local to genes of interest and perform parameter shrinkage with a regularization process. These approaches ignore the effect of distal-SNPs or other molecular effects underlying the SNP-gene association. Here, we outline multi-omics strategies for transcriptome imputation from germline genetics to allow more powerful testing of gene-trait associations by prioritizing distal-SNPs to the gene of interest. In one extension, we identify mediating biomarkers (CpG sites, microRNAs, and transcription factors) highly associated with gene expression and train predictive models for these mediators using their local SNPs. Imputed values for mediators are then incorporated into the final predictive model of gene expression, along with local SNPs. In the second extension, we assess distal-eQTLs (SNPs associated with genes not in a local window around it) for their mediation effect through mediating biomarkers local to these distal-eSNPs. Distal-eSNPs with large indirect mediation effects are then included in the transcriptomic prediction model with the local SNPs around the gene of interest. Using simulations and real data from ROS/MAP brain tissue and TCGA breast tumors, we show considerable gains of percent variance explained (1–2% additive increase) of gene expression and TWAS power to detect gene-trait associations. This integrative approach to transcriptome-wide imputation and association studies aids in identifying the complex interactions underlying genetic regulation within a tissue and important risk genes for various traits and disorders
Recommended from our members
The GTEx Consortium atlas of genetic regulatory effects across human tissues
The Genotype-Tissue Expression (GTEx) project dissects how genetic variation affects gene expression and splicing. Some human genetic variants affect the amount of RNA produced and the splicing of gene transcripts, crucial steps in development and maintaining a healthy individual. However, some of these changes only occur in a small number of tissues within the body. The Genotype-Tissue Expression (GTEx) project has been expanded over time, and, looking at the final data in version 8, Aguet et al. present a deep characterization of genetic associations and gene expression and splicing in 838 individuals over 49 tissues (see the Perspective by Wilson). This large study was able to characterize the details underlying many aspects of gene expression and provides a resource with which to better understand the fundamental molecular mechanisms of how genetic variants affect gene regulation and complex traits in humans. Science, this issue p. 1318; see also p. 1298 The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.We thank the donors and their families for their generous gifts of organ donation for transplantation and tissue donations for the GTEx research project; the Genomics Platform at the Broad Institute for data generation; J. Struewing for support and leadership of the GTEx project; M. Khan and C. Stolte for the illustrations in Fig. 1; and R. Do, D. Jordan, and M. Verbanck for providing GWAS pleiotropy scores. Funding: This work was supported by the Common Fund of the Office of the Director, U.S. National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, NIA, NIAID, and NINDS through NIH contracts HHSN261200800001E (Leidos Prime contract with NCI: A.M.S., D.E.T., N.V.R., J.A.M., L.S., M.E.B., L.Q., T.K., D.B., K.R., and A.U.), 10XS170 (NDRI: W.F.L., J.A.T., G.K., A.M., S.S., R.H., G.Wa., M.J., M.Wa., L.E.B., C.J., J.W., B.R., M.Hu., K.M., L.A.S., H.M.G., M.Mo., and L.K.B.), 10XS171 (Roswell Park Cancer Institute: B.A.F., M.T.M., E.K., B.M.G., K.D.R., and J.B.), 10X172 (Science Care Inc.), 12ST1039 (IDOX), 10ST1035 (Van Andel Institute: S.D.J., D.C.R., and D.R.V.), HHSN268201000029C (Broad Institute: F.A., G.G., K.G.A., A.V.S., X.Li., E.T., S.G., A.G., S.A., K.H.H., D.T.N., K.H., S.R.M., and J.L.N.), 5U41HG009494 (F.A., G.G., and K.G.A.), and through NIH grants R01 DA006227-17 (University of Miami Brain Bank: D.C.M. and D.A.D.), Supplement to University of Miami grant DA006227 (D.C.M. and D.A.D.), R01 MH090941 (University of Geneva), R01 MH090951 and R01 MH090937 (University of Chicago), R01 MH090936 (University of North Carolina–Chapel Hill), R01MH101814 (M.M.-A., V.W., S.B.M., R.G., E.T.D., D.G.-M., and A.V.), U01HG007593 (S.B.M.), R01MH101822 (C.D.B.), U01HG007598 (M.O. and B.E.S.), U01MH104393 (A.P.F.), extension H002371 to 5U41HG002371 (W.J.K.), as well as other funding sources: R01MH106842 (T.L., P.M., E.F., and P.J.H.), R01HL142028 (T.L., Si.Ka., and P.J.H.), R01GM122924 (T.L. and S.E.C.), R01MH107666 (H.K.I.), P30DK020595 (H.K.I.), UM1HG008901 (T.L.), R01GM124486 (T.L.), R01HG010067 (Y.Pa.), R01HG002585 (G.Wa. and M.St.), Gordon and Betty Moore Foundation GBMF 4559 (G.Wa. and M.St.), 1K99HG009916-01 (S.E.C.), R01HG006855 (Se.Ka. and R.E.H.), BIO2015-70777-P, Ministerio de Economia y Competitividad and FEDER funds (M.M.-A., V.W., R.G., and D.G.-M.), la Caixa Foundation ID 100010434 under agreement LCF/BQ/SO15/52260001 (D.G.-M.), NIH CTSA grant UL1TR002550-01 (P.M.), Marie-Skłodowska Curie fellowship H2020 Grant 706636 (S.K.-H.), R35HG010718 (E.R.G.), FPU15/03635, Ministerio de Educación, Cultura y Deporte (M.M.-A.),R01MH109905, 1R01HG010480 (A.Ba.), Searle Scholar Program (A.Ba.), R01HG008150 (S.B.M.), 5T32HG000044-22, NHGRI Institutional Training Grant in Genome Science (N.R.G.), EU IMI program (UE7-DIRECT-115317-1) (E.T.D. and A.V.), FNS funded project RNA1 (31003A_149984) (E.T.D. and A.V.), DK110919 (F.H.), F32HG009987 (F.H.), Massachusetts Lions Eye Research Fund Grant (A.R.H.), Wellcome grant WT108749/Z/15/Z (P.F.), and European Molecular Biology Laboratory (P.F. and D.Z.).Peer Reviewed"Article signat per 1 autors/es del BSC membres del THE GTEX CONSORTIUM: Marta Mele Messeguer"Postprint (author's final draft
STATISTICAL METHODS FOR INFERRING GENETIC REGULATION ACROSS HETEROGENEOUS SAMPLES AND MULTIMODAL DATA
As clinical datasets have increased in size and a wider range of molecular profiles can be credibly measured, understanding sources of heterogeneity has become critical in studying complex phenotypes. Here, we investigate and develop statistical approaches to address and analyze technical variation, genetic diversity, and tissue heterogeneity in large biological datasets. Commercially available methods for normalization of NanoString nCounter RNA expression data are suboptimal in fully addressing unwanted technical variation. First, we develop a more comprehensive quality control, normalization, and validation framework for nCounter data, benchmark it against existing normalization methods for nCounter, and show its advantages on four datasets of differing sample sizes. We then develop race-specific and genetic ancestry-adjusted tumor transcriptomic prediction models from germline genetics in the Carolina Breast Cancer Study (CBCS) and study the performance of these models across ancestral groups and molecular subtypes. These models are employed in a transcriptome-wide association study (TWAS) to identify four novel genetic loci associated with breast-cancer specific survival. Next, we extend TWAS to a novel suite of tools, MOSTWAS, to prioritize distal genetic variation in transcriptomic predictive models with two multi-omic approaches that draw from mediation analysis. We empirically show the utility of these extensions in simulation analyses, TCGA breast cancer data, and ROS/MAP brain tissue data. We develop a novel distal-SNPs added-last test, to be used with MOSTWAS models, to prioritize distal loci that give added information, beyond the association in the local locus around a gene. Lastly, we develop DeCompress, a deconvolution method from gene expression from targeted RNA panels such as NanoString, which have a much smaller feature space than traditional RNA expression assays. We propose an ensemble approach that leverages compressed sensing to expand the feature space and validate it on data from the CBCS. We conduct extensive benchmarking of existing deconvolution methods using simulated in-silico experiments, pseudo-targeted panels from published mixing experiments, and data from the CBCS to show the advantage of DeCompress over reference-free methods. We lastly show the utility of in-silico cell-type proportion estimation in outcome prediction and eQTL mapping.Doctor of Philosoph
Developing a computational workflow for eQTL analysis on the X chromosome
Despite advances in sequencing technology and computational biology which led to identifying
underlying causes for complex traits, utilization of X chromosome data lags behind the autosomes.
This can be attributed to the inherent complexities of analyzing X chromosome data
and extra data processing steps needed before the analysis. The aim of this thesis was to develop
a computational workflow for the inclusion of X chromosome analysis and improve the
shortcomings in order to supplement the existing eQTL analysis methods. We demonstrated
that after adjustment of X chromosome dosage differences between females and males, existing
workflows can be used to uncover potential causal variants for complex traits and diseases. Using
RNA-seq data from human lymphoblastoma cell lines obtained from GEUVADIS project
we performed statistical fine mapping and colocalization analysis with external databases. Results
show significant associations of PLP2 gene with respiratory and cardiovascular functions
- …