902 research outputs found

    Comprehensive DNA methylation profiling in a human cancer genome identifies novel epigenetic targets

    Get PDF
    Using a unique microarray platform for cytosine methylation profiling, the DNA methylation landscape of the human genome was monitored at more than 21,000 sites, including 79% of the annotated transcriptional start sites (TSS). Analysis of an oligodendroglioma derived cell line LN-18 revealed more than 4000 methylated TSS. The gene-centric analysis indicated a complex pattern of DNA methylation exists along each autosome, with a trend of increasing density approaching the telomeres. Remarkably, 2% of CpG islands (CGI) were densely methylated, and 17% had significant levels of 5 mC, whether or not they corresponded to a TSS. Substantial independent verification, obtained from 95 loci, suggested that this approach is capable of large scale detection of cytosine methylation with an accuracy approaching 90%. In addition, we detected large genomic domains that are also susceptible to DNA methylation reinforced inactivation, such as the HOX cluster on chromosome 7 (CH7). Extrapolation from the data suggests that more than 2000 genomic loci may be susceptible to methylation and associated inactivation, and most have yet to be identified. Finally, we report six new targets of epigenetic inactivation (IRX3, WNT10A, WNT6, RARalpha, BMP7 and ZGPAT). These targets displayed cell line and tumor specific differential methylation when compared with normal brain samples, suggesting they may have utility as biomarkers. Uniquely, hypermethylation of the CGI within an IRX3 exon was correlated with over-expression of IRX3 in tumor tissues and cell lines relative to normal brain samples

    Assembly and Compositional Analysis of Human Genomic DNA - Doctoral Dissertation, August 2002

    Get PDF
    In 1990, the United States Human Genome Project was initiated as a fifteen-year endeavor to sequence the approximately three billion bases making up the human genome (Vaughan, 1996).As of December 31, 2001, the public sequencing efforts have sequenced a total of 2.01 billion finished bases representing 63.0% of the human genome (http://www.ncbi.nlm.nih.gov/genome/seq/page.cgi?F=HsProgress.shtml&&ORG=Hs) to a Bermuda quality error rate of 1/10000 (Smith and Carrano, 1996). In addition, 1.11 billion bases representing 34.8% of the human genome has been sequenced to a rough-draft level. Efforts such as UCSC\u27s GoldenPath (Kent and Haussler, 2001) and NCBI\u27s contig assembly (Jang et al., 1999) attempt to assemble the human genome by incorporating both finished and rough-draft sequence. The availability of the human genome data allows us to ask questions concerning the maintenance of specific regions of the human genome. We consider two hypotheses for maintenance of high G+C regions: the presence of specific repetitive elements and compositional mutation biases. Our results rule out the possibility of the G+C content of repetitive elements determining regions of high and low G+C regions in the human genome. We determine that there is a compositional bias for mutation rates. However, these biases are not responsible for the maintenance of high G+C regions. In addition, we show that regions of the human under less selective pressure will mutate towards a higher A+T composition, regardless of the surrounding G+C composition. We also analyze sequence organization and show that previous studies of isochore regions (Bernardi,1993) cannot be generalized within the human genome. In addition, we propose a method to assemble only those parts of the human genome that are finished into larger contigs. Analysis of the contigs can lead to the mining of meaningful biological data that can give insights into genetic variation and evolution. I suggest a method to help aid in single nucleotide polymorphism (SNP)detection, which can help to determine differences within a population. I also discuss a dynamic-programming based approach to sequence assembly validation and detection of large-scale polymorphisms within a population that is made possible through the availability of large human sequence contigs

    Fuzzy logistic regression for detecting differential DNA methylation regions

    Get PDF
    “Epigenetics is the study of changes in gene activity or function that are not related to a change in the DNA sequence. DNA methylation is one of the main types of epigenetic modifications, that occur when a methyl chemical group attaches to a cytosine on the DNA sequence. Although the sequence does not change, the addition of a methyl group can change the way genes are expressed and produce different phenotypes. DNA methylation is involved in many biological processes and has important implications in the fields of biomedicine and agriculture. Statistical methods have been developed to compare DNA methylation at cytosine nucleotides between populations of interest (e.g., healthy and diseased) across the entire genome from next generation sequence (NGS) data. Testing for the differences between populations in DNA methylation at specific sites is often followed by an assessment of regional difference using post hoc aggregation procedures to group neighboring sites that are differentially methylated. Although site-level analysis can yield some useful information, there are advantages to testing for differential methylation across entire genomic regions. Examining genomic regions produces less noise, reduces the numbers of statistical tests, and has the potential to provide more informative results to biologists. In this research, several different types of logistic regression models are investigated to test for differentially methylated regions (DMRs). The focus of this work is on developing a fuzzy logistic regression model for DMR detection. Two other logistic regression methods (weighted average logistic regression and ordinal logistic regression) are also introduced as alternative approaches. The performance of these novel approaches are then compared with an existing logistic regression method (MAGIg) for region-level testing, using data simulated based on two (one plant, one human) real NGS methylation data sets”--Abstract, page iii

    Modeling Complex Patterns of Differential DNA Methylation That Associate with Expression Change

    Get PDF
    Gene expression is driven by specific combinations of transcription factors binding to regulatory sequences to define cell type expression profiles. Changes in DNA sequence alter transcription factor binding affinities and gene expression, and DNA methylation is an additional source of variation that is maintained throughout cellular division. Numerous genomic studies are underway to determine which genes are abnormally regulated by DNA methylation in disease. However, we have a poor understanding of how disease-specific methylation variation affects expression. Global DNA demethylation agents have been clinically approved for use in cancer, which has spurred interest in identifying genes which would be most susceptible for targeted demethylation therapies. In this work, I developed multiple tools to increase our knowledge about the relationship between methylation and gene expression in both tissue specificity and disease. I first developed a computational strategy to identify amplifications and deletions from restriction enzyme-based methylation datasets. In a model of endocrine therapy resistant breast cancer, I identify ESR1 as the most amplified genomic region in response to estrogen deprivation. I develop a qPCR-based assay to probe the amplification in cell lines, formalin-fixed paraffin embedded samples, patient tumors, and xenograft samples. This data is consistent with the hypothesis that in a subset of patients, the ESR1 amplification results in increased levels of ER. These are produced in response to estrogen deprivation to sensitize breast cancer to low available quantities of estrogen for cellular growth. Next, to explain specific variation in methylation that associates with expression change in both disease and tissue-specificity, I developed an integrative analysis tool, Methylation-based Gene Expression Classification (ME-Class). This model captures the complexity of methylation changes around a gene promoter. Using whole-genome bisulfite sequencing and RNA-seq datasets from different tissue samples, ME-Class significantly outperforms published methods using methylation to predict differential gene expression change. To demonstrate its utility, I used ME-Class to analyze different hematopoietic cell types, and identified that expressionassociated methylation changes were predominantly found when comparing cells from distantly related lineages, implying that changes in the cell’s transcriptional program precede associated methylation changes. Training ME-Class on normal-tumor pairs indicated that cancer-specific expression-associated methylation changes differ from tissue-specific changes. I further show that ME-Class can detect functionally relevant cancer-specific, expression-associated methylation changes that are reversed upon the removal of methylation in a model of colon cancer. Lastly, I extended ME-Class to incorporate 5-hydroxymethylcytosine and uncovered gene regulatory logic involving 5hmC and 5mC in mammalian development and disease. As more large-scale, genome-wide, differential DNA methylation studies become available, tools such as ME-class will prove invaluable to understand how specific methylation changes affect transcription. Our results show this toolset can identify genes that are dysregulated by methylation in disease, and could be used to facilitate the identification of patients who may benefit from clinically-approved demethylating therapeutics

    Transcription Initiation Patterns Indicate Divergent Strategies for Gene Regulation at the Chromatin Level

    Get PDF
    The application of deep sequencing to map 5′ capped transcripts has confirmed the existence of at least two distinct promoter classes in metazoans: “focused” promoters with transcription start sites (TSSs) that occur in a narrowly defined genomic span and “dispersed” promoters with TSSs that are spread over a larger window. Previous studies have explored the presence of genomic features, such as CpG islands and sequence motifs, in these promoter classes, but virtually no studies have directly investigated the relationship with chromatin features. Here, we show that promoter classes are significantly differentiated by nucleosome organization and chromatin structure. Dispersed promoters display higher associations with well-positioned nucleosomes downstream of the TSS and a more clearly defined nucleosome free region upstream, while focused promoters have a less organized nucleosome structure, yet higher presence of RNA polymerase II. These differences extend to histone variants (H2A.Z) and marks (H3K4 methylation), as well as insulator binding (such as CTCF), independent of the expression levels of affected genes. Notably, differences are conserved across mammals and flies, and they provide for a clearer separation of promoter architectures than the presence and absence of CpG islands or the occurrence of stalled RNA polymerase. Computational models support the stronger contribution of chromatin features to the definition of dispersed promoters compared to focused start sites. Our results show that promoter classes defined from 5′ capped transcripts not only reflect differences in the initiation process at the core promoter but also are indicative of divergent transcriptional programs established within gene-proximal nucleosome organization

    Topological Data Analysis of High-dimensional Correlation Structures with Applications in Epigenetics

    Get PDF
    This thesis comprises a comprehensive study of the correlation of highdimensional datasets from a topological perspective. Derived from a lack of efficient algorithms of big data analysis and motivated by the importance of finding a structure of correlations in genomics, we have developed two analytical tools inspired by the topological data analysis approach that describe and predict the behavior of the correlated design. Those models allowed us to study epigenetic interactions from a local and global perspective, taking into account the different levels of complexity. We applied graph-theoretic and algebraic topology principles to quantify structural patterns on local correlation networks and, based on them, we proposed a network model that was able to predict the locally high correlations of DNA methylation data. This model provided with an efficient tool to measure the evolution of the correlation with the aging process. Furthermore, we developed a powerful computational algorithm to analyze the correlation structure globally that was able to detect differentiated methylation patterns over sample groups. This methodology aimed to serve as a diagnostic tool, as it provides with selected epigenetic biomarkers associated with a specific phenotype of interest. Overall, this work establishes a novel perspective of analysis and modulation of hidden correlation structures, specifically those of great dimension and complexity, contributing to the understanding of the epigenetic processes, and that is designed to be useful for non-biological fields too

    Identification of DNA methylation changes associated with human gastric cancer

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Epigenetic alteration of gene expression is a common event in human cancer. DNA methylation is a well-known epigenetic process, but verifying the exact nature of epigenetic changes associated with cancer remains difficult.</p> <p>Methods</p> <p>We profiled the methylome of human gastric cancer tissue at 50-bp resolution using a methylated DNA enrichment technique (methylated CpG island recovery assay) in combination with a genome analyzer and a new normalization algorithm.</p> <p>Results</p> <p>We were able to gain a comprehensive view of promoters with various CpG densities, including CpG Islands (CGIs), transcript bodies, and various repeat classes. We found that gastric cancer was associated with hypermethylation of 5' CGIs and the 5'-end of coding exons as well as hypomethylation of repeat elements, such as short interspersed nuclear elements and the composite element SVA. Hypermethylation of 5' CGIs was significantly correlated with downregulation of associated genes, such as those in the <it>HOX </it>and histone gene families. We also discovered long-range epigenetic silencing (LRES) regions in gastric cancer tissue and identified several hypermethylated genes (<it>MDM2</it>, <it>DYRK2</it>, and <it>LYZ</it>) within these regions. The methylation status of CGIs and gene annotation elements in metastatic lymph nodes was intermediate between normal and cancerous tissue, indicating that methylation of specific genes is gradually increased in cancerous tissue.</p> <p>Conclusions</p> <p>Our findings will provide valuable data for future analysis of CpG methylation patterns, useful markers for the diagnosis of stomach cancer, as well as a new analysis method for clinical epigenomics investigations.</p

    Copy-number aware methylation deconvolution analysis of cancers

    Get PDF
    DNA methylation has long been known to play a role in tumourigenesis. To this day, interpretation of bulk tumour bisulphite sequencing data has been hampered by normal contamination levels and tumour copy number. To address this issue, we develop two computational tools: (1) ASCAT.m, which allows Allele-Specific Copy number Analysis of Tumour methylation data directly from bulk tumour reduced representation bisulphite sequencing (RRBS) data and (2) CAMDAC, a method for Copy Number-Aware Methylation Deconvolution Analysis of Cancer, from bulk tumour and adjacent normal RRBS data. We describe a set of rules to compute allelic imbalance independently of bisulphite conversion and correct normalised read coverage estimates for sequencing biases. We apply ASCAT.m to non-small cell lung cancers from the epiTRACERx study with multi-region bulk tumour RRBS and adjacent normal. ASCAT.m genotypes, allele-specific copy numbers and tumour purity and ploidy estimates are in excellent agreement with those obtained from matched whole-exome and a subset of whole-genome sequencing of the same samples. We observe a correlation between whole-genome doubling and relapse-free survival in lung squamous cell carcinoma but not in adenocarcinoma. We see widespread genomic instability across both histological subtypes. We model bulk tumour methylation rates as a mixture of tumour and normal signals weighed for tumour purity and copy number and formalise this relationship into CAMDAC equations. The errors between predicted and observed methylation rates were low. Normal infiltrates Fluorescence-activated cell sorting (FACS)-purified from the bulk tumour were similar in composition to the adjacent matched normal lung, suggesting the latter is a suitable proxy for deconvolution. Single nucleotide variant (SNV)- and FACS-purified tumour methylation rates are in good agreement with CAMDAC deconvoluted estimates. Purification successfully removes shared normal signal, decreasing correlations between patients and to normal after purification. Samples with shared ancestry remain highly correlated. Purified methylation rates yield accurate tumour-normal and tumour-tumour differential methylation calls independent of tumour purity and copy number. We find hundreds of ubiquitously early clonal gene promoter epimutations across the epiTRACERx cohort, showcasing the potential of DNA methylation markers for early cancer detection. CAMDAC purified profiles reveal both phylogenetic and inter-tumour relationships as well as provide insight in tumour evolutionary history. Quantifying allele-specific methylation on chromosome X in females, we uncover extraction biases against the Barr body. X inactivation is random at the scale of our normal lung cancer samples. Phasing of methylation rates with polymorphisms confirms the presence of allele-specific methylation at the H19/IGF2 locus. Loss of imprinting is observed in 5 tumours, all involving demethylation of the maternal allele. We attempt to quantify the ratio of clonal allele-specific to bi-allelic epimutations in tumours in regions of 1+1, which we define as regulatory and stochastic methylation changes, respectively. Utilising this ratio, we try to extract the number of stochastic epimutations in regions of 2+0 with copy numbers 1 and 2 and time those copy number gains. We find that SNVs at gene promoters often lead to hypermethylation of neighbouring CpGs on the same copy or allele, suggesting the ablation of a transcription factor binding site. Non-expressed neo-antigen are enriched for promoter hypermethylation, indicating methylation plays a role in immune escape. To conclude, CAMDAC purified methylation rates are key to unlock insights into comparative cancer epigenomics and intra-tumour epigenetic heterogeneity

    DNA Methylation in Rectal Cancer: Validating Findings of an Epigenome-Wide Association Study

    Get PDF
    Background: Preliminary studies conducted by our group utilised the Illumina Infinium Human Methylation 450k Beadchip array to perform an epigenome-wide association study (EWAS) of 15 matched rectal tumour (RT) and adjacent mucosa (AM) samples. 176 differentially methylated probes (DMPs) were identified (P<0.00001). RT was also characterised by significantly reduced global methylation in comparison to AM. Aims: This study aimed to validate specific and global DNA methylation differences identified by our preliminary work. We then sought to replicate the findings in additional samples. Finally, we attempted to identify correlations between DNA methylation differences and clinicopathological tumour features. Materials and Methods: Polymerase chain reaction (PCR) and bisulphite pyrosequencing assays were designed and optimised to quantify DNA methylation at nine DMPs nominated by our EWAS. Pearson’s test was used to calculate the correlation between 450k and pyrosequencing methylation values. Replication was performed in an additional cohort of 68 matched colorectal tumour and AM pairs. Global DNA methylation of the discovery cohort was quantified using the luminometric methylation assay (LUMA). Potential relationships between tumour features and differential methylation were investigated using univariate (t-tests or ANOVA) and multivariate analyses (logistic regression). Results: All DMPs selected for validation showed strong correlations between bisulphite pyrosequencing and Illumina 450k methylation values (r= 0.87-0.97). Global hypomethylation was observed in RT (54.6%) when compared to AM (63.5%, P = 0.021). All probes displayed significant levels of differential methylation in the replication cohort (P = <2.2e-16). No significant associations were observed between DNA methylation and clinicopathological tumour features, however this may reflect the relatively small number of samples assessed. Conclusions: Our studies have identified and validated a novel methylomic signature of rectal cancer. Although no clinicopathological correlations were observed with the DMPs investigated, others may represent potential targets in the diagnosis, monitoring and risk stratification of rectal cancer.Northcott Medical FoundationThe Wolfson FoundationThe Royal College of Surgeons of EnglandBowel Cancer Wes
    corecore