21 research outputs found

    Estimation of CpG Coverage in Whole Methylome Next-Generation Sequencing Studies

    Get PDF
    Background Methylation studies are a promising complement to genetic studies of DNA sequence. However, detailed prior biological knowledge is typically lacking, so methylome-wide association studies (MWAS) will be critical to detect disease relevant sites. A cost-effective approach involves the next-generation sequencing (NGS) of single-end libraries created from samples that are enriched for methylated DNA fragments. A limitation of single-end libraries is that the fragment size distribution is not observed. This hampers several aspects of the data analysis such as the calculation of enrichment measures that are based on the number of fragments covering the CpGs. Results We developed a non-parametric method that uses isolated CpGs to estimate sample-specific fragment size distributions from the empirical sequencing data. Through simulations we show that our method is highly accurate. While the traditional (extended) read count methods resulted in severely biased coverage estimates and introduces artificial inter-individual differences, through the use of the estimated fragment size distributions we could remove these biases almost entirely. Furthermore, we found correlations of 0.999 between coverage estimates obtained using fragment size distributions that were estimated with our method versus those that were “observed” in paired-end sequencing data. Conclusions We propose a non-parametric method for estimating fragment size distributions that is highly precise and can improve the analysis of cost-effective MWAS studies that sequence single-end libraries created from samples that are enriched for methylated DNA fragments

    MethylPCA: a toolkit to control for confounders in methylome-wide association studies

    Get PDF
    Background In methylome-wide association studies (MWAS) there are many possible differences between cases and controls (e.g. related to life style, diet, and medication use) that may affect the methylome and produce false positive findings. An effective approach to control for these confounders is to first capture the major sources of variation in the methylation data and then regress out these components in the association analyses. This approach is, however, computationally very challenging due to the extremely large number of methylation sites in the human genome. Result We introduce MethylPCA that is specifically designed to control for potential confounders in studies where the number of methylation sites is extremely large. MethylPCA offers a complete and flexible data analysis including 1) an adaptive method that performs data reduction prior to PCA by empirically combining methylation data of neighboring sites, 2) an efficient algorithm that performs a principal component analysis (PCA) on the ultra high-dimensional data matrix, and 3) association tests. To accomplish this MethylPCA allows for parallel execution of tasks, uses C++ for CPU and I/O intensive calculations, and stores intermediate results to avoid computing the same statistics multiple times or keeping results in memory. Through simulations and an analysis of a real whole methylome MBD-seq study of 1,500 subjects we show that MethylPCA effectively controls for potential confounders. Conclusions MethylPCA provides users a convenient tool to perform MWAS. The software effectively handles the challenge in memory and speed to perform tasks that would be impossible to accomplish using existing software when millions of sites are interrogated with the sample sizes required for MWAS

    MethylPCA: a toolkit to control for confounders in methylome-wide association studies

    Get PDF
    Abstract Background In methylome-wide association studies (MWAS) there are many possible differences between cases and controls (e.g. related to life style, diet, and medication use) that may affect the methylome and produce false positive findings. An effective approach to control for these confounders is to first capture the major sources of variation in the methylation data and then regress out these components in the association analyses. This approach is, however, computationally very challenging due to the extremely large number of methylation sites in the human genome. Result We introduce MethylPCA that is specifically designed to control for potential confounders in studies where the number of methylation sites is extremely large. MethylPCA offers a complete and flexible data analysis including 1) an adaptive method that performs data reduction prior to PCA by empirically combining methylation data of neighboring sites, 2) an efficient algorithm that performs a principal component analysis (PCA) on the ultra high-dimensional data matrix, and 3) association tests. To accomplish this MethylPCA allows for parallel execution of tasks, uses C++ for CPU and I/O intensive calculations, and stores intermediate results to avoid computing the same statistics multiple times or keeping results in memory. Through simulations and an analysis of a real whole methylome MBD-seq study of 1,500 subjects we show that MethylPCA effectively controls for potential confounders. Conclusions MethylPCA provides users a convenient tool to perform MWAS. The software effectively handles the challenge in memory and speed to perform tasks that would be impossible to accomplish using existing software when millions of sites are interrogated with the sample sizes required for MWAS

    A methylome-wide study of aging using massively parallel sequencing of the methyl-CpG-enriched genomic fraction from blood in over 700 subjects

    Get PDF
    The central importance of epigenetics to the aging process is increasingly being recognized. Here we perform a methylome-wide association study (MWAS) of aging in whole blood DNA from 718 individuals, aged 25–92 years (mean = 55). We sequenced the methyl-CpG-enriched genomic DNA fraction, averaging 67.3 million reads per subject, to obtain methylation measurements for the ∼27 million autosomal CpGs in the human genome. Following extensive quality control, we adaptively combined methylation measures for neighboring, highly-correlated CpGs into 4 344 016 CpG blocks with which we performed association testing. Eleven age-associated differentially methylated regions (DMRs) passed Bonferroni correction (P-value < 1.15 × 10−8). Top findings replicated in an independent sample set of 558 subjects using pyrosequencing of bisulfite-converted DNA (min P-value < 10−30). To examine biological themes, we selected 70 DMRs with false discovery rate of <0.1. Of these, 42 showed hypomethylation and 28 showed hypermethylation with age. Hypermethylated DMRs were more likely to overlap with CpG islands and shores. Hypomethylated DMRs were more likely to be in regions associated with polycomb/regulatory proteins (e.g. EZH2) or histone modifications H3K27ac, H3K4m1, H3K4m2, H3K4m3 and H3K9ac. Among genes implicated by the top DMRs were protocadherins, homeobox genes, MAPKs and ryanodine receptors. Several of our DMRs are at genes with potential relevance for age-related disease. This study successfully demonstrates the application of next-generation sequencing to MWAS, by interrogating a large proportion of the methylome and returning potentially novel age DMRs, in addition to replicating several loci implicated in previous studies using microarrays

    Refinement of schizophrenia GWAS loci using methylome-wide association data

    Get PDF
    Recent genome-wide association studies (GWAS) have made substantial progress in identifying disease loci. The next logical step is to design functional experiments to identify disease mechanisms. This step, however, is often hampered by the large size of loci identified in GWAS that is caused by linkage disequilibrium (LD) between SNPs. In this study, we demonstrate how integrating methylome-wide association study (MWAS) results with GWAS findings can narrow down the location for a subset of the putative casual sites. We use the disease schizophrenia as an example. To handle “data analytic” variation we first combined our MWAS results with two GWAS meta-analyses (N=32,143 and 21,953), that had largely overlapping samples but different data analysis pipelines, separately. Permutation tests showed significant overlapping association signals between GWAS and MWAS findings. This significant overlap justified prioritizing loci based on the concordance principle. To further ensure that the methylation signal was not driven by chance, we successfully replicated the top three methylation findings near genes SDCCAG8, CREB1 and ATXN7 in an independent sample using targeted pyrosequencing. In contrast to the SNPs in the selected region, the methylation sites were largely uncorrelated explaining why the methylation signals implicated much smaller regions (median size 78bp). The refined loci showed considerable enrichment of genomic elements of possible functional importance and suggested specific hypotheses about schizophrenia etiology. Several hypotheses involved possible variation in transcription factor binding efficiencies

    Combined Whole Methylome and Genomewide Association Study Implicates CNTN4 in Alcohol Use

    Get PDF
    BACKGROUND: Methylome-wide association (MWAS) studies present a new way to advance the search for biological correlates for alcohol use. A challenge with methylation studies of alcohol involves the causal direction of significant methylation-alcohol associations. One way to address this issue is to combine MWAS data with genomewide association study (GWAS) data. METHODS: Here, we combined MWAS and GWAS results for alcohol use from 619 individuals. Our MWAS data were generated by next-generation sequencing of the methylated genomic DNA fraction, producing over 60 million reads per subject to interrogate methylation levels at ~27 million autosomal CpG sites in the human genome. Our GWAS included 5,571,786 single nucleotide polymorphisms (SNPs) imputed with 1000 Genomes. RESULTS: When combining the MWAS and GWAS data, our top finding was a region in an intron of CNTN4 (p = 2.55 × 10(-8) ), located between chr3: 2,555,403 and 2,555,524, encompassing SNPs rs1382874 and rs1382875. This finding was then replicated in an independent sample of 730 individuals. We used bisulfite pyrosequencing to measure methylation and found significant association with regular alcohol use in the same direction as the MWAS (p = 0.021). Rs1382874 and rs1382875 were genotyped and found to be associated in the same direction as the GWAS (p = 0.008 and p = 0.009). After integrating the MWAS and GWAS findings from the replication sample, we replicated our combined analysis finding (p = 0.0017) in CNTN4. CONCLUSIONS: Through combining methylation and SNP data, we have identified CNTN4 as a risk factor for regular alcohol use

    Methylome-Wide Association Study of Schizophrenia: Identifying Blood Biomarker Signatures of Environmental Insults

    Get PDF
    Epigenetic studies present unique opportunities to advance schizophrenia research because they can potentially account for many of its clinical features and suggest novel strategies to improve disease management

    Methylome-wide comparison of human genomic DNA extracted from whole blood and from EBV-transformed lymphocyte cell lines

    No full text
    DNA from Epstein-Barr virus-transformed lymphocyte cell lines (LCLs) has proven useful for studies of genetic sequence polymorphisms. Whether LCL DNA is suitable for methylation studies is less clear. We conduct a genome-wide methylation investigation using an array set with 45 million probes to investigate the methylome of LCL DNA and technical duplicates of WB DNA from the same 10 individuals. We focus specifically on methylation sites that show variation between individuals and, therefore, are potentially useful as biomarkers. The sample correlations for the methylation variable probes ranged from 0.69 to 0.78 for the WB duplicates and from 0.27 to 0.72 for WB vs LCL. To compare the pattern of the methylation signals, we grouped adjacent probes based on their inter-correlations. These analyses showed ∼29 000 and ∼14 000 blocks in WB and LCL, respectively. Merely 31% of the methylated regions detected in WB were detectable in LCLs. Furthermore, we observed significant differences in mean difference between WB and LCL as compared with duplicates of WB (P-value = 2.2 × 10-16). Our study shows that there are substantial differences in the DNA methylation patterns between LCL and WB. Thus, LCL DNA should not be used as a proxy for WB DNA in methylome-wide studies
    corecore