DNA methylation heterogeneity profiles of 928 CCLE cell lines

Abstract

Motivation Bisulfite sequencing data carry invaluable information about epigenetic states of a cell population beyond DNA methylation levels. Phased DNA methylation states (DNA methylation pattern; i.e., an array of DNA methylation states of CpGs simultaneously covered by a single read) can serve as a local barcode representing the epigenetic state of a single cell. Therefore we can compute approximate epigenetic diversity through measuring the diversity of DNA methylation patterns (inter-molecule / inter-cellular heterogeneity). On the other hand, DNA methylation patterns also inform us of the local disorder of DNA methylation states, which already have been shown to have prognostic potential (Landau et al., 2014). To facilitate studies on such concept of DNA methylation heterogeneity, we developed an efficient software named Metheor and here provide a comprehensive DNA methylation profiles of 928 cancer cell lines from cancer cell line encyclopedia (CCLE) computed by Metheor. Data processing Raw reduced representation bisulfite sequencing  (RRBS) reads for 928 CCLE cell lines were downloaded under SRA study accession SRP186687, and preprocessed using Trim Galore! v0.6.7 with --rrbs option. Reads were then aligned to hg38 reference genome using Bismark v0.23.1. The resulting alignments are used to compute DNA methylation heterogeneity levels (see below) through Metheor v0.1.0. Seven measures for DNA methylation heterogeneity Profiles of seven DNA methylation heterogeneity measures are provided in this dataset. Proportion of discordant reads (PDR) Local pairwise methylation disorder (LPMD) Methylation haplotype load (MHL) Epipolymorphism (PM) Methylation entropy (ME) Fraction of discordant read pairs (FDRP) Quantitative fraction of discordant pairs (qFDRP) For a more detailed description of those measures, please refer to this GitHub repository. Data tables We provide 7 tables for DNA methylation heterogeneity profiles and an additional table that contains the average methylation level information. ccle.pdr.csv: Table for average proportion of discordant reads (PDR) for various genomic contexts ccle.lpmd.csv:Table for average local pairwise methylation disorder (LPMD) for various genomic contexts ccle.mhl.csv: Table for average methylation haplotype load (MHL) for various genomic contexts ccle.pm.csv: Table for average epipolymorphism (PM) for various genomic contexts ccle.me.csv: Table for average methylation entropy (ME) for various genomic contexts. ccle.fdrp.csv: Table for average FDRP levels for various genomic contexts. ccle.qfdrp.csv: Table for average qFDRP levels for various genomic contexts. ccle.beta.csv: Table for average DNA methylation levels for various genomic contexts. Schema for data tables All data tables are in comma-separated values (csv) format sharing the following columns: cell_line_name: Identifier for the cell line. run_accession: SRA run accession of the corresponding RRBS data. tissue: Tissue collection site. disease: Full disease type (e.g., carcinoma (ductal carcinoma), carcinoma (squamous_cell_carcinoma), or lymphoid_noeplasm (Hodgkin_lymphoma)) disease_primary: General disease type (e.g., carcinoma or lymphoid_neoplasm). disease_secondary: Specific disease type (e.g., ductal carcinoma, squamous_cell_carcinoma or Hodgkin_lymphoma). disease_stage: Indicates whether tissue sample is from primary or metastatic site. age_at_sampling: Age of tissue donor at sampling if known. Otherwise, values are left empty. sex: Sex of tissue donor if known. Otherwise, values are left empty. ethnicity: Ethnicity of tissue donor if known. Otherwise, values are left empty. genomewide: Genomewide average DNA methylation heterogeneity levels. promoter: Average DNA methylation heterogeneity levels at promoters of protein-coding genes. cgi: Average DNA methylation heterogeneity levels at CpG islands. Annotations were downloaded from UCSC table browser. cpg_shore: Average DNA methylation heterogeneity levels at CpG shores. CpG shores are defined as 2kb regions flanking upstream or downstream of CpG islands. Regions overlapping CpG islands were excluded. cpg_shelf: Average DNA methylation heterogeneity levels at CpG shelves. CpG shelves are defined as 2kb regions flanking upstream or downstream of (CpG island + CpG shore) regions. Regions overlapping CpG islands or shores were excluded. methylation_canyon: Average DNA methylation heterogeneity levels at methylation canyons. DNA methylation canyons are defined as broad (> 3.5kb) under-methylated regions (Jeong et al., 2014), and their hg38 annotations were downloaded from (Su et al., 2018). exon: Average DNA methylation heterogeneity levels at exons of protein coding genes. intron: Average DNA methylation heterogeneity levels at introns of protein coding genes. gene_body: Average DNA methylation heterogeneity levels at gene bodies of protein coding genes. LINE: Average DNA methylation heterogeneity levels at LINEs. Annotations were downloaded from UCSC table browser (hg38, Repeats-RepeatMasker). SINE: Average DNA methylation heterogeneity levels at SINEs LTR: Average DNA methylation heterogeneity levels at LTR retrotransposons Availability of Metheor The source code for Metheor can be found at https://github.com/dohlee/metheor You can install Metheor using conda at commandline: $ conda install -c dohlee metheor</p

    Similar works

    Full text

    thumbnail-image

    Available Versions