121 research outputs found

    Filtering, FDR and power

    Get PDF
    Background: In high-dimensional data analysis such as differential gene expression analysis, people often use filtering methods like fold-change or variance filters in an attempt to reduce the multiple testing penalty and improve power. However, filtering may introduce a bias on the multiple testing correction. The precise amount of bias depends on many quantities, such as fraction of probes filtered out, filter statistic and test statistic used.Results: We show that a biased multiple testing correction results if non-differentially expressed probes are not filtered out with equal probability from the entire range of p-values. We illustrate our results using both a simulation study and an experimental dataset, where the FDR is shown to be biased mostly by filters that are associated with the hypothesis being tested, such as the fold change. Filters that induce little bias on the FDR yield less additional power of detecting differentially expressed genes. Finally, we propose a statistical test that can be used in practice to determine whether any chosen filter introduces bias on the FDR estimate used, given a general experimental setup.Conclusions: Filtering out of probes must be used with care as it may bias the multiple testing correction. Researchers can use our test for FDR bias to guide their choice of filter and amount of filtering in practice

    A characterization of cis- and trans-heritability of RNA-Seq-based gene expression

    Get PDF
    Insights into individual differences in gene expression and its heritability (h2) can help in understanding pathways from DNA to phenotype. We estimated the heritability of gene expression of 52,844 genes measured in whole blood in the largest twin RNA-Seq sample to date (1497 individuals including 459 monozygotic twin pairs and 150 dizygotic twin pairs) from classical twin modeling and identity-by-state-based approaches. We estimated for each gene h2 total, composed of cis-heritability (h2 cis, the variance explained by single nucleotide polymorphisms in the cis-window of the gene), and trans-heritability (h2 res, the residual variance explained by all other genome-wide variants). Mean h2 total was 0.26, which was significantly higher than heritability estimates earlier found in a microarray-based study using largely overlapping (>60%) RNA samples (mean h2 = 0.14, p = 6.15 × 10−258). Mean h2 cis was 0.06 and strongly correlated with beta of the top cis expression quantitative loci (eQTL, ρ = 0.76, p < 10−308) and with estimates from earlier RNA-Seq-based studies. Mean h2 res was 0.20 and correlated with the beta of the corresponding trans-eQTL (ρ = 0.04, p < 1.89 × 10−3) and was significantly higher for genes involved in cytokine-cytokine interactions (p = 4.22 × 10−15), many other immune system pathways, and genes identified in genome-wide association studies for various traits including behavioral disorders and cancer. This study provides a thorough characterization of cis- and trans-h2 estimates of gene expression, which is of value for interpretation of GWAS and gene expression studies

    Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution

    Get PDF
    We show that epigenome- and transcriptome-wide association studies (EWAS and TWAS) are prone to significant inflation and bias of test statistics, an unrecognized phenomenon introducing spurious findings if left unaddressed. Neither GWAS-based methodology nor state-of-the-art confounder adjustment methods completely remove bias and inflation. We propose a Bayesian method to control bias and inflation in EWAS and TWAS based on estimation of the empirical null distribution. Using simulations and real data, we demonstrate that our method maximizes power while properly controlling the false positive rate. We illustrate the utility of our method in large-scale EWAS and TWAS meta-analyses of age and smoking.</p

    Genome-wide identification of genes regulating DNA methylation using genetic anchors for causal inference

    Get PDF
    BACKGROUND: DNA methylation is a key epigenetic modification in human development and disease, yet there is limited understanding of its highly coordinated regulation. Here, we identify 818 genes that affect DNA methylation patterns in blood using large-scale population genomics data. RESULTS: By employing genetic instruments as causal anchors, we establish directed associations between gene expression and distant DNA methylation levels, while ensuring specificity of the associations by correcting for linkage disequilibrium and pleiotropy among neighboring genes. The identified genes are enriched for transcription factors, of which many consistently increased or decreased DNA methylation levels at multiple CpG sites. In addition, we show that a substantial number of transcription factors affected DNA methylation at their experimentally determined binding sites. We also observe genes encoding proteins with heterogenous functions that have widespread effects on DNA methylation, e.g., NFKBIE, CDCA7(L), and NLRC5, and for several examples, we suggest plausible mechanisms underlying their effect on DNA methylation. CONCLUSION: We report hundreds of genes that affect DNA methylation and provide key insights in the principles underlying epigenetic regulation

    Occupational exposure to gases/fumes and mineral dust affect DNA methylation levels of genes regulating expression

    Get PDF
    Many workers are daily exposed to occupational agents like gases/fumes, mineral dust or biological dust, which could induce adverse health effects. Epigenetic mechanisms, such as DNA methylation, have been suggested to play a role. We therefore aimed to identify differentially methylated regions (DMRs) upon occupational exposures in never-smokers and investigated if these DMRs associated with gene expression levels. To determine the effects of occupational exposures independent of smoking, 903 never-smokers of the LifeLines cohort study were included. We performed three genome-wide methylation analyses (Illumina 450 K), one per occupational exposure being gases/fumes, mineral dust and biological dust, using robust linear regression adjusted for appropriate confounders. DMRs were identified using comb-p in Python. Results were validated in the Rotterdam Study (233 never-smokers) and methylation-expression associations were assessed using Biobank-based Integrative Omics Study data (n = 2802). Of the total 21 significant DMRs, 14 DMRs were associated with gases/fumes and 7 with mineral dust. Three of these DMRs were associated with both exposures (RPLP1 and LINC02169 (2x)) and 11 DMRs were located within transcript start sites of gene expression regulating genes. We replicated two DMRs with gases/fumes (VTRNA2-1 and GNAS) and one with mineral dust (CCDC144NL). In addition, nine gases/fumes DMRs and six mineral dust DMRs significantly associated with gene expression levels. Our data suggest that occupational exposures may induce differential methylation of gene expression regulating genes and thereby may induce adverse health effects. Given the millions of workers that are exposed daily to occupational exposures, further studies on this epigenetic mechanism and health outcomes are warranted

    Age-related accrual of methylomic variability is linked to fundamental ageing mechanisms

    Get PDF
    Background: Epigenetic change is a hallmark of ageing but its link to ageing mechanisms in humans remains poorly understood. While DNA methylation at many CpG sites closely tracks chronological age, DNA methylation changes relevant to biological age are expected to gradually dissociate from chronological age, mirroring the increased heterogeneity in health status at older ages. Results: Here, we report on the large-scale identification of 6366 age-related variably methylated positions (aVMPs) identified in 3295 whole blood DNA methylation profiles, 2044 of which have a matching RNA-seq gene expression profile. aVMPs are enriched at polycomb repressed regions and, accordingly, methylation at those positions is associated with the expression of genes encoding components of polycomb repressive complex 2 (PRC2) in trans. Further analysis revealed trans-associations for 1816 aVMPs with an additional 854 genes. These trans-associated aVMPs are characterized by either an age-related

    Blood lipids influence DNA methylation in circulating cells

    Get PDF
    Background: Cells can be primed by external stimuli to obtain a long-term epigenetic memory. We hypothesize that long-term exposure to elevated blood lipids can prime circulating immune cells through changes in DNA methylation, a process that may contribute to the development of atherosclerosis. To interrogate the causal relationship between triglyceride, low-density lipoprotein (LDL) cholesterol, and high-density lipoprotein (HDL) cholesterol levels and genome-wide DNA methylation while excluding confounding and pleiotropy, we perform a stepwise Mendelian randomization analysis in whole blood of 3296 individuals. Results: This analysis shows that differential methylation is the consequence of inter-individual variation in blood lipid levels and not vice versa. Specifically, we observe an effect of triglycerides on DNA methylation at three CpGs, of LDL cholesterol at one CpG, and of HDL cholesterol at two CpGs using multivariable Mendelian randomization. Using RNA-seq data available for a large subset of individuals (N = 2044), DNA methylation of these six CpGs is associated with the expression of CPT1A and SREBF1 (for triglycerides), DHCR24 (for LDL cholesterol) and

    Refining Attention-Deficit/Hyperactivity Disorder and Autism Spectrum Disorder Genetic Loci by Integrating Summary Data From Genome-wide Association, Gene Expression, and DNA Methylation Studies

    Get PDF
    Background: Recent genome-wide association studies (GWASs) identified the first genetic loci associated with attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorder (ASD). The next step is to use these results to increase our understanding of the biological mechanisms involved. Most of the identified variants likely influence gene regulation. The aim of the current study is to shed light on the mechanisms underlying the genetic signals and prioritize genes by integrating GWAS results with gene expression and DNA methylation (DNAm) levels. Methods: We applied summary-data–based Mendelian randomization to integrate ADHD and ASD GWAS data with fetal brain expression and methylation quantitative trait loci, given the early onset of these disorders. We also analyzed expression and methylation quantitative trait loci datasets of adult brain and blood, as these provide increased statistical power. We subsequently used summary-data–based Mendelian randomization to investigate if the same variant influences both DNAm and gene expression levels. Results: We identified multiple gene expression and DNAm levels in fetal brain at chromosomes 1 and 17 that were associated with ADHD and ASD, respectively, through pleiotropy at shared genetic variants. The analyses in brain and blood showed additional associated gene expression and DNAm levels at the same and additional loci, likely because of increased statistical power. Several of the associated genes have not been identified in ADHD and ASD GWASs before. Conclusions: Our findings identified the genetic variants associated with ADHD and ASD that likely act through gene regulation. This facilitates prioritization of candidate genes for functional follow-up studies
    • 

    corecore