16 research outputs found

    Performance comparison with WSHPackage using multiple threads for 20M RRBS-simulated reads.

    No full text
    Performance comparison with WSHPackage using multiple threads for 20M RRBS-simulated reads.</p

    Schematic illustration of proportion of discorant reads (PDR).

    No full text
    Schematic illustration of proportion of discorant reads (PDR).</p

    Association between stemness of cancer cells and other DNA methylation heterogeneity measures.

    No full text
    Association between stemness of cancer cells and other DNA methylation heterogeneity measures.</p

    Schematic illustration of methylation haplotype load (MHL).

    No full text
    Schematic illustration of methylation haplotype load (MHL).</p

    Association between methylation entropy and cancer stemness.

    No full text
    (A) Genes were ranked by the Pearson’s correlation between their expression and average methylation entropy levels across promoters. Red dots represent 3,680 genes having statistically significant correlations (Benjamini-Hochberg adjusted p-value WNT7A and CTNND2). (E) The association between promoter methylation entropy levels and the activity of Wnt signaling pathway. *two-tailed independent t-test p < 0.05; In D-E, Pearson’s correlation coefficients and associated p-values are shown. In D, p-values were adjusted using Benjamini-Hochberg procedure.</p

    Promoter PDRs of tumor suppressors and oncogenes.

    No full text
    Phased DNA methylation states within bisulfite sequencing reads are valuable source of information that can be used to estimate epigenetic diversity across cells as well as epigenomic instability in individual cells. Various measures capturing the heterogeneity of DNA methylation states have been proposed for a decade. However, in routine analyses on DNA methylation, this heterogeneity is often ignored by computing average methylation levels at CpG sites, even though such information exists in bisulfite sequencing data in the form of phased methylation states, or methylation patterns. In this study, to facilitate the application of the DNA methylation heterogeneity measures in downstream epigenomic analyses, we present a Rust-based, extremely fast and lightweight bioinformatics toolkit called Metheor. As the analysis of DNA methylation heterogeneity requires the examination of pairs or groups of CpGs throughout the genome, existing softwares suffer from high computational burden, which almost make a large-scale DNA methylation heterogeneity studies intractable for researchers with limited resources. In this study, we benchmark the performance of Metheor against existing code implementations for DNA methylation heterogeneity measures in three different scenarios of simulated bisulfite sequencing datasets. Metheor was shown to dramatically reduce the execution time up to 300-fold and memory footprint up to 60-fold, while producing identical results with the original implementation, thereby facilitating a large-scale study of DNA methylation heterogeneity profiles. To demonstrate the utility of the low computational burden of Metheor, we show that the methylation heterogeneity profiles of 928 cancer cell lines can be computed with standard computing resources. With those profiles, we reveal the association between DNA methylation heterogeneity and various omics features. Source code for Metheor is at https://github.com/dohlee/metheor and is freely available under the GPL-3.0 license.</div

    Schematic illustration of local pairwise methylation discordance (LPMD).

    No full text
    Schematic illustration of local pairwise methylation discordance (LPMD).</p

    Overview of Metheor.

    No full text
    (A) The input for Metheor is bisulfite read alignment tagged with Bismark methylation call strings. Using each of the seven subcommands shown, Metheor computes the corresponding DNA methylation heterogeneity measure. If reads were aligned with a tool other than Bismark, Metheor can still add tag for methylation call string with metheor tag subcommand to make alignment file compatible for Metheor run. (B) Schematic diagram for DNA methylation heterogeneity measures and benchmark settings in this study. [5] denote the Perl script provided by the authors along with the article proposing the utility of MHL. (C, D) Schematic diagram illustrating (C) read-centric algorithm and (D) CpG-centric algorithm for the computation of DNA methylation heterogeneity. The advantages (plus symbol) and disadvantages (minus symbol) are shown below the diagrams. (E) Distribution of the average number of CpGs per sequencing read for the RRBS data from 928 CCLE cell lines. (F) Genomewide average levels of proportion of discordant reads (PDR) and local pairwise methylation discordance (LPMD) against varying read lengths. (G) Schematic illustration for the definition of local pairwise methylation discordance (LPMD) and examples. The proportion of reads having different DNA methylation states for a pair of CpGs (red arrows) are computed.</p

    Characteristics of LPMD across 928 cancer cell lines.

    No full text
    (A) Genomewide average methylation levels and LPMD levels grouped by tissue types. Black vertical lines denote groupwise average levels of methylation and LPMD levels. Black horizontal bars on the right side denote the standard deviation of corresponding values. (B) Genomewide average methylation levels and LPMD levels grouped by disease types. Disease types from haematopoietic and lymphoid tissues are highlighted in red. (C, D) Correlation between mRNA expression and (C) genomewide average LPMD or (D) genomewide average methylation level. Genes are ranked according to the p-values of the corresponding correlation coefficients. P-values were adjusted using Benjamini-Hochberg procedure. (E, F) Correlation between DNMT3A expression and (E) genomewide average LPMD or (F) genomewide average methylation level. (G, H) Trends of fixed-distance average LPMD values. Shades denote 95% confidence interval. In (H), Cell lines were divided into two groups based on the median DNMT3A expression. (I) Difference of fixed-distance average LPMD values between DNMT3AHigh and DNMT3ALow groups.</p

    Robustness of LPMD against the choice of genomic distance window.

    No full text
    Robustness of LPMD against the choice of genomic distance window.</p
    corecore