16 research outputs found

    Tracking the evolution of esophageal squamous cell carcinoma under dynamic immune selection by multi-omics sequencing

    Get PDF
    Intratumoral heterogeneity (ITH) has been linked to decreased efficacy of clinical treatments. However, although genomic ITH has been characterized in genetic, transcriptomic and epigenetic alterations are hallmarks of esophageal squamous cell carcinoma (ESCC), the extent to which these are heterogeneous in ESCC has not been explored in a unified framework. Further, the extent to which tumor-infiltrated T lymphocytes are directed against cancer cells, but how the immune infiltration acts as a selective force to shape the clonal evolution of ESCC is unclear. In this study, we perform multi-omic sequencing on 186 samples from 36 primary ESCC patients. Through multi-omics analyses, it is discovered that genomic, epigenomic, and transcriptomic ITH are underpinned by ongoing chromosomal instability. Based on the RNA-seq data, we observe diverse levels of immune infiltrate across different tumor sites from the same tumor. We reveal genetic mechanisms of neoantigen evasion under distinct selection pressure from the diverse immune microenvironment. Overall, our work offers an avenue of dissecting the complex contribution of the multi-omics level to the ITH in ESCC and thereby enhances the development of clinical therapy

    Copy-number aware methylation deconvolution analysis of cancers

    Get PDF
    DNA methylation has long been known to play a role in tumourigenesis. To this day, interpretation of bulk tumour bisulphite sequencing data has been hampered by normal contamination levels and tumour copy number. To address this issue, we develop two computational tools: (1) ASCAT.m, which allows Allele-Specific Copy number Analysis of Tumour methylation data directly from bulk tumour reduced representation bisulphite sequencing (RRBS) data and (2) CAMDAC, a method for Copy Number-Aware Methylation Deconvolution Analysis of Cancer, from bulk tumour and adjacent normal RRBS data. We describe a set of rules to compute allelic imbalance independently of bisulphite conversion and correct normalised read coverage estimates for sequencing biases. We apply ASCAT.m to non-small cell lung cancers from the epiTRACERx study with multi-region bulk tumour RRBS and adjacent normal. ASCAT.m genotypes, allele-specific copy numbers and tumour purity and ploidy estimates are in excellent agreement with those obtained from matched whole-exome and a subset of whole-genome sequencing of the same samples. We observe a correlation between whole-genome doubling and relapse-free survival in lung squamous cell carcinoma but not in adenocarcinoma. We see widespread genomic instability across both histological subtypes. We model bulk tumour methylation rates as a mixture of tumour and normal signals weighed for tumour purity and copy number and formalise this relationship into CAMDAC equations. The errors between predicted and observed methylation rates were low. Normal infiltrates Fluorescence-activated cell sorting (FACS)-purified from the bulk tumour were similar in composition to the adjacent matched normal lung, suggesting the latter is a suitable proxy for deconvolution. Single nucleotide variant (SNV)- and FACS-purified tumour methylation rates are in good agreement with CAMDAC deconvoluted estimates. Purification successfully removes shared normal signal, decreasing correlations between patients and to normal after purification. Samples with shared ancestry remain highly correlated. Purified methylation rates yield accurate tumour-normal and tumour-tumour differential methylation calls independent of tumour purity and copy number. We find hundreds of ubiquitously early clonal gene promoter epimutations across the epiTRACERx cohort, showcasing the potential of DNA methylation markers for early cancer detection. CAMDAC purified profiles reveal both phylogenetic and inter-tumour relationships as well as provide insight in tumour evolutionary history. Quantifying allele-specific methylation on chromosome X in females, we uncover extraction biases against the Barr body. X inactivation is random at the scale of our normal lung cancer samples. Phasing of methylation rates with polymorphisms confirms the presence of allele-specific methylation at the H19/IGF2 locus. Loss of imprinting is observed in 5 tumours, all involving demethylation of the maternal allele. We attempt to quantify the ratio of clonal allele-specific to bi-allelic epimutations in tumours in regions of 1+1, which we define as regulatory and stochastic methylation changes, respectively. Utilising this ratio, we try to extract the number of stochastic epimutations in regions of 2+0 with copy numbers 1 and 2 and time those copy number gains. We find that SNVs at gene promoters often lead to hypermethylation of neighbouring CpGs on the same copy or allele, suggesting the ablation of a transcription factor binding site. Non-expressed neo-antigen are enriched for promoter hypermethylation, indicating methylation plays a role in immune escape. To conclude, CAMDAC purified methylation rates are key to unlock insights into comparative cancer epigenomics and intra-tumour epigenetic heterogeneity

    iRODS metadata management for a cancer genome analysis workflow

    Get PDF
    Background: The massive amounts of data from next generation sequencing (NGS) methods pose various challenges with respect to data security, storage and metadata management. While there is a broad range of data analysis pipelines, these challenges remain largely unaddressed to date. Results: We describe the integration of the open-source metadata management system iRODS (Integrated Rule-Oriented Data System) with a cancer genome analysis pipeline in a high performance computing environment. The system allows for customized metadata attributes as well as fine-grained protection rules and is augmented by a user-friendly front-end for metadata input. This results in a robust, efficient end-to-end workflow under consideration of data security, central storage and unified metadata information. Conclusions: Integrating iRODS with an NGS data analysis pipeline is a suitable method for addressing the challenges of data security, storage and metadata management in NGS environments

    Computational solutions for addressing heterogeneity in DNA methylation data

    Get PDF
    DNA methylation, a reversible epigenetic modification, has been implicated with various bi- ological processes including gene regulation. Due to the multitude of datasets available, it is a premier candidate for computational tool development, especially for investigating hetero- geneity within and across samples. We differentiate between three levels of heterogeneity in DNA methylation data: between-group, between-sample, and within-sample heterogeneity. Here, we separately address these three levels and present new computational approaches to quantify and systematically investigate heterogeneity. Epigenome-wide association studies relate a DNA methylation aberration to a phenotype and therefore address between-group heterogeneity. To facilitate such studies, which necessar- ily include data processing, exploratory data analysis, and differential analysis of DNA methy- lation, we extended the R-package RnBeads. We implemented novel methods for calculating the epigenetic age of individuals, novel imputation methods, and differential variability analysis. A use-case of the new features is presented using samples from Ewing sarcoma patients. As an important driver of epigenetic differences between phenotypes, we systematically investigated associations between donor genotypes and DNA methylation states in methylation quantitative trait loci (methQTL). To that end, we developed a novel computational framework –MAGAR– for determining statistically significant associations between genetic and epigenetic variations. We applied the new pipeline to samples obtained from sorted blood cells and complex bowel tissues of healthy individuals and found that tissue-specific and common methQTLs have dis- tinct genomic locations and biological properties. To investigate cell-type-specific DNA methylation profiles, which are the main drivers of within-group heterogeneity, computational deconvolution methods can be used to dissect DNA methylation patterns into latent methylation components. Deconvolution methods require pro- files of high technical quality and the identified components need to be biologically interpreted. We developed a computational pipeline to perform deconvolution of complex DNA methyla- tion data, which implements crucial data processing steps and facilitates result interpretation. We applied the protocol to lung adenocarcinoma samples and found indications of tumor in- filtration by immune cells and associations of the detected components with patient survival. Within-sample heterogeneity (WSH), i.e., heterogeneous DNA methylation patterns at a ge- nomic locus within a biological sample, is often neglected in epigenomic studies. We present the first systematic benchmark of scores quantifying WSH genome-wide using simulated and experimental data. Additionally, we created two novel scores that quantify DNA methyla- tion heterogeneity at single CpG resolution with improved robustness toward technical biases. WSH scores describe different types of WSH in simulated data, quantify differential hetero- geneity, and serve as a reliable estimator of tumor purity. Due to the broad availability of DNA methylation data, the levels of heterogeneity in DNA methylation data can be comprehensively investigated. We contribute novel computational frameworks for analyzing DNA methylation data with respect to different levels of hetero- geneity. We envision that this toolbox will be indispensible for understanding the functional implications of DNA methylation patterns in health and disease.DNA Methylierung ist eine reversible, epigenetische Modifikation, die mit verschiedenen biologischen Prozessen wie beispielsweise der Genregulation in Verbindung steht. Eine Vielzahl von DNA MethylierungsdatensĂ€tzen bildet die perfekte Grundlage zur Entwicklung von Softwareanwendungen, insbesondere um HeterogenitĂ€t innerhalb und zwischen Proben zu beschreiben. Wir unterscheiden drei Ebenen von HeterogenitĂ€t in DNA Methylierungsdaten: zwischen Gruppen, zwischen Proben und innerhalb einer Probe. Hier betrachten wir die drei Ebenen von HeterogenitĂ€t in DNA Methylierungsdaten unabhĂ€ngig voneinander und prĂ€sentieren neue AnsĂ€tze um die HeterogenitĂ€t zu beschreiben und zu quantifizieren. Epigenomweite Assoziationsstudien verknĂŒpfen eine DNA MethylierungsverĂ€nderung mit einem PhĂ€notypen und beschreiben HeterogenitĂ€t zwischen Gruppen. Um solche Studien, welche Datenprozessierung, sowie exploratorische und differentielle Datenanalyse beinhalten, zu vereinfachen haben wir die R-basierte Softwareanwendung RnBeads erweitert. Die Erweiterungen beinhalten neue Methoden, um das epigenetische Alter vorherzusagen, neue SchĂ€tzungsmethoden fĂŒr fehlende Datenpunkte und eine differentielle VariabilitĂ€tsanalyse. Die Analyse von Ewing-Sarkom Patientendaten wurde als Anwendungsbeispiel fĂŒr die neu entwickelten Methoden gewĂ€hlt. Wir untersuchten Assoziationen zwischen Genotypen und DNA Methylierung von einzelnen CpGs, um sogenannte methylation quantitative trait loci (methQTL) zu definieren. Diese stellen einen wichtiger Faktor dar, der epigenetische Unterschiede zwischen Gruppen induziert. Hierzu entwickelten wir ein neues Softwarepaket (MAGAR), um statistisch signifikante Assoziationen zwischen genetischer und epigenetischer Variation zu identifizieren. Wir wendeten diese Pipeline auf Blutzelltypen und komplexe Biopsien von gesunden Individuen an und konnten gemeinsame und gewebespezifische methQTLs in verschiedenen Bereichen des Genoms lokalisieren, die mit unterschiedlichen biologischen Eigenschaften verknĂŒpft sind. Die Hauptursache fĂŒr HeterogenitĂ€t innerhalb einer Gruppe sind zelltypspezifische DNA Methylierungsmuster. Um diese genauer zu untersuchen kann Dekonvolutionssoftware die DNA Methylierungsmatrix in unabhĂ€ngige Variationskomponenten zerlegen. Dekonvolutionsmethoden auf Basis von DNA Methylierung benötigen technisch hochwertige Profile und die identifizierten Komponenten mĂŒssen biologisch interpretiert werden. In dieser Arbeit entwickelten wir eine computerbasierte Pipeline zur DurchfĂŒhrung von Dekonvolutionsexperimenten, welche die Datenprozessierung und Interpretation der Resultate beinhaltet. Wir wendeten das entwickelte Protokoll auf Lungenadenokarzinome an und fanden Anzeichen fĂŒr eine Tumorinfiltration durch Immunzellen, sowie Verbindungen zum Überleben der Patienten. HeterogenitĂ€t innerhalb einer Probe (within-sample heterogeneity, WSH), d.h. heterogene Methylierungsmuster innerhalb einer Probe an einer genomischen Position, wird in epigenomischen Studien meist vernachlĂ€ssigt. Wir prĂ€sentieren den ersten Vergleich verschiedener, genomweiter WSH Maße auf simulierten und experimentellen Daten. ZusĂ€tzlich entwickelten wir zwei neue Maße um WSH fĂŒr einzelne CpGs zu berechnen, welche eine verbesserte Robustheit gegenĂŒber technischen Faktoren aufweisen. WSH Maße beschreiben verschiedene Arten von WSH, quantifizieren differentielle HeterogenitĂ€t und sagen Tumorreinheit vorher. Aufgrund der breiten VerfĂŒgbarkeit von DNA Methylierungsdaten können die Ebenen der HeterogenitĂ€t ganzheitlich beschrieben werden. In dieser Arbeit prĂ€sentieren wir neue Softwarelösungen zur Analyse von DNA Methylierungsdaten in Bezug auf die verschiedenen Ebenen der HeterogenitĂ€t. Wir sind davon ĂŒberzeugt, dass die vorgestellten Softwarewerkzeuge unverzichtbar fĂŒr das VerstĂ€ndnis von DNA Methylierung im kranken und gesunden Stadium sein werden

    Statistical and integrative system-level analysis of DNA methylation data

    Get PDF
    Epigenetics plays a key role in cellular development and function. Alterations to the epigenome are thought to capture and mediate the effects of genetic and environmental risk factors on complex disease. Currently, DNA methylation is the only epigenetic mark that can be measured reliably and genome-wide in large numbers of samples. This Review discusses some of the key statistical challenges and algorithms associated with drawing inferences from DNA methylation data, including cell-type heterogeneity, feature selection, reverse causation and system-level analyses that require integration with other data types such as gene expression, genotype, transcription factor binding and other epigenetic information

    Epigenetic variation associated with genetic and environmental factors in the aetiology of Type 2 diabetes.

    Get PDF
    PhDType 2 diabetes, as a complex disease, has a range of genetic and environmental factors that underpin its aetiology. It is hoped that the emerging study of epigenetic processes will provide the necessary mechanistic insight into the genetic and environmental interactions that, to date, are poorly understood. This thesis considers the role of DNA methylation, an epigenetic modification, in the aetiology of type 2 diabetes. A range of different genome-wide and whole genome techniques are applied to a study of established type 2 diabetes and experimental models (human and animal) of fetal programming. Samples from a recent genome-wide association study of type 2 diabetes were used to identify DNA methylation patterns at areas of genetic variation associated with disease risk. Analysis of data from methylated DNA immunoprecipitation and microarray identified a genetic-epigenetic interaction in the FTO gene. At this locus, the presence or absence of a SNP created or abrogated a CpG site capable of methylation and further analyses highlighted possible functional relevance via enhancer activity. Models of fetal programming were then used to identify whether variation in DNA methylation may underlie the ‘programmed’ phenotype of diabetes and related cardiometabolic disease. Pre-existing human models of programming via maternal vitamin B12 deficiency and maternal famine exposure have been used to generate exploratory evidence of such mechanisms. Whole genome-based techniques (Medip-seq and Illumina 450k methylation array) were used to profile DNA methylation in whole blood samples from the offspring born to each of these studies. Custom bioinformatic analysis was performed to identify differences in methylation between offspring exposed versus unexposed to the in utero environmental insult. Technical replication and validation studies are ongoing to confirm or refute the presence of regions of differential methylation. Finally, this thesis considers whether a state of ‘over nutrition’ gestational diabetes, may play a role in fetal programming. This condition is of increasing prevalence across the world and is characterised by maternal hyperglycaemia and insulin resistance, often resulting in fetal overgrowth. A mouse model using an inbred strain (Lepr) of mice induced a programmed phenotype of glucose intolerance and obesity in aged offspring born to mothers with gestational diabetes. Medip-seq performed on the livers of late gestation mouse embryos identified differential methylation in cases vs. controls, located at genomic regions with potential functional relevance. A human cohort of women with gestational diabetes was collected to develop further hypothesis around the multiple environmental factors that could interact in pregnancy. Prevalent nutritional deficiencies of vitamin D, iron and one-carbon metabolites were found in women with and without gestational diabetes recruited from a local antenatal clinic. This thesis presents preliminary findings that variation in DNA methylation may be involved in the genetic and environmental risk of type 2 diabetes. The work presented highlights how future studies must incorporate integrated genetic, epigenetic and functional analysis with sufficient sample size if their results are to be translatable to diverse populations at risk of diabetes

    The impact of the environment on DNA methylation in humans and zebrafish

    Get PDF
    DNA methylation is a chemical modification to the DNA strand, which can control gene expression. DNA methylation can be modified by the environment. For example, tobacco use substantially alters DNA methylation, and hence DNA methylation therefore provides a route through which the environment can lead to alterations in gene expression. Consequently, alterations to DNA methylation patterns have been associated with disease phenotypes in humans and other mammals. However, the precise role of environmentally-induced DNA methylation changes in the onset of pathological phenotypes is not often clearly defined. Here, we investigate the response of DNA methylation to two different environmental exposures – adulthood cannabis and in utero tobacco exposure. These environmental exposures are important because they are associated with adverse phenotypes – long-term cannabis use, particularly through adolescence, is associated with adverse psychosocial wellbeing. The development of conduct problem (CP, including autism and antisocial behaviour disorder) in childhood and adolescence is associated with exposure to tobacco during development (in utero). However, as yet, no studies have explored the role of DNA methylation in the link between these exposures and their associated phenotypic effects. Therefore, here we first asked whether DNA methylation in a longitudinal human cohort, the Christchurch Health and Development Study (CHDS), was altered in response to long term cannabis exposure, with and without tobacco. Using the Illumina EPIC array, we detected nominal differential DNA methylation in response to cannabis specifically, in genes associated with the following pathways; Cholinergic synapse, glutamatergic synapse and dopaminergic synapse. These observations show a potential mediation between DNA methylation in the observed phenotypic effects of cannabis use. In order to develop a tool to investigate this association further, we assessed the efficacy of a targeted, high throughput amplicon-based approach, bisulfite - based amplicon sequencing (BSAS), to replicate differential methylation at loci identified via EPIC array. We found that the ability of BSAS to detect equivalent differential methylation was locus-specific, meaning that it has value as a validation and replication tool, but that each locus for validation must be tested before being applied to a large study. Cannabis use is a contentious issue, mainly because of the debate around its therapeutic but also its psychoactive properties. In order to quantify the impact of both of its main cannabinoids, (-)-trans-∆9-tetrahydrocannabinol (THC) and cannabidiol (CBD) were exposed to zebrafish embryos. Following exposure reduced representation bisulfite sequencing (RRBS) was used to quantify their impact of each cannabinoid on DNA methylation. Differential methylation was found in each of the exposure groups, findings demonstrated the greatest number of methylation differences was in the CBD exposure group. CBD DNA methylation differences were found in genes that have roles in neurodevelopment, neurotransmission and behaviour. THC DNA methylati on differences on the other hand were found to alter genes with roles in the axon guidance and retinal ganglion pathways, supporting the role of DNA methylati on in the biological response to THC. Furthermore, our data revealed a role for both THC and CBD in brain related pathways, indicating that further research is needed to understand the full biological impacts of the two compounds. Next, to determine if tobacco-induced DNA methylation alterations are important in the link between in utero tobacco exposure and the development of CP, here, we applied BSAS to a subset of CHDS participants to assess DNA methylation in in utero-exposed individuals compared to non-exposed individuals, with and without CP. We selected a panel of genes with known roles in in utero neurodevelopment, and identified differential methylation that was specific to individuals exposed to tobacco during development, who had high CP scores. We imply that developmentally-induced DNA methylati on alterations may be playing a role in the development of CP in exposed individuals. To investigate this further, we applied a genome-wide approach (EPIC array) to a larger cohort and identified nominal significance at genes involved in global developmental delay and neurological disorders, indicating that, in addition to CP, visual impairment may be a phenotypic response to in utero tobacco exposure. Lastly, we discuss whether DNA methylation analysis in whole blood samples is able to predict DNA methylation changes in brain tissue. To answer this question, we used publicly available data of the top lists of differentially methylated CpG sites in blood and brain tissue from individuals with schizophrenia. We found that, the methylation of individual CpG sites did not replicate between tissues, the genes and pathways that have biological relevance to schizophrenia (e.g. mTOR signalling pathway and the mRNA surveillance pathway) were identified in both tissue types, demonstrating the value and applicability of whole blood as a proxy tissue. Overall, here we demonstrate a role for DNA methylation in the biological response to cannabis, and a link between in utero tobacco exposure and development of CP. Further research is required to understand the mechanism through which these changes can contribute to disease
    corecore