5,189 research outputs found

    Cellular reprogramming by Epstein-Barr virus nuclear antigens

    Get PDF
    Epstein-Barr virus (EBV) is a widespread human B cell virus that is linked to many malignancies. EBV modulates the transcriptome of B lymphocytes to drive immortalisation and viral persistence. This is primarily coordinated by the EBV nuclear antigens (EBNA) 2 and the EBNA 3 family (3A, 3B and 3C), which regulate overlapping sets of cellular genes. Using Chromatin immunoprecipitation (ChIP) coupled to next generation sequencing we found >21000 EBNA 2 and >7000 EBNA 3 binding sites in the human genome, providing the first evidence of EBNA 3 association with the human genome in vivo. Binding sites were predominantly distal to transcription start sites (TSS) indicating a key role in long-range gene control. This was especially pronounced for EBNA 3 proteins (84% of sites over 4kb from any TSS). 56% of genes previously reported to be regulated by these EBNA proteins in micro array experiments were bound by an EBNA. Using ChIP-QPCR we confirmed EBNA 3C bound to and promoted epigenetic silencing of a subset of integrin receptor signalling genes (ITGA4, ITGB1, ADAM28, ADAMDEC1). Indirect silencing of CXCL10 and CXCL11 chemokines by EBNA 3C was also demonstrated. 75% of sites bound by EBNA 3 were also bound by EBNA 2 implicating extensive interplay between EBNA proteins in gene regulation. By examining novel (WEE1, CTBP2) and known (BCL2L11, ITGAL) targets of EBNA 3 proteins bound at promoter-proximal or distal binding sites, we found both cell-type and locus-specific binding and transcriptional regulation. Importantly, genes differentially regulated by a subset EBNA 3 proteins were bound by the same subset, providing a mechanism for selective regulation of host genes by EBNA 3 proteins. In summary, this research demonstrates that EBNA proteins primarily act through long-range enhancer elements and regulate gene expression in a locus and gene-specific manner through differential binding

    Epigenomes in Cardiovascular Disease.

    Get PDF
    If unifying principles could be revealed for how the same genome encodes different eukaryotic cells and for how genetic variability and environmental input are integrated to impact cardiovascular health, grand challenges in basic cell biology and translational medicine may succumb to experimental dissection. A rich body of work in model systems has implicated chromatin-modifying enzymes, DNA methylation, noncoding RNAs, and other transcriptome-shaping factors in adult health and in the development, progression, and mitigation of cardiovascular disease. Meanwhile, deployment of epigenomic tools, powered by next-generation sequencing technologies in cardiovascular models and human populations, has enabled description of epigenomic landscapes underpinning cellular function in the cardiovascular system. This essay aims to unpack the conceptual framework in which epigenomes are studied and to stimulate discussion on how principles of chromatin function may inform investigations of cardiovascular disease and the development of new therapies

    Advancing the analysis of bisulfite sequencing data in its application to ecological plant epigenetics

    Get PDF
    The aim of this thesis is to bridge the gap between the state-of-the-art bioinformatic tools and resources, currently at the forefront of epigenetic analysis, and their emerging applications to non-model species in the context of plant ecology. New, high-resolution research tools are presented; first in a specific sense, by providing new genomic resources for a selected non-model plant species, and also in a broader sense, by developing new software pipelines to streamline the analysis of bisulfite sequencing data, in a manner which is applicable to a wide range of non-model plant species. The selected species is the annual field pennycress, Thlaspi arvense, which belongs in the same lineage of the Brassicaceae as the closely-related model species, Arabidopsis thaliana, and yet does not benefit from such extensive genomic resources. It is one of three key species in a Europe-wide initiative to understand how epigenetic mechanisms contribute to natural variation, stress responses and long-term adaptation of plants. To this end, this thesis provides a high-quality, chromosome-level assembly for T. arvense, alongside a rich complement of feature annotations of particular relevance to the study of epigenetics. The genome assembly encompasses a hybrid approach, involving both PacBio continuous long reads and circular consensus sequences, alongside Hi-C sequencing, PCR-free Illumina sequencing and genetic maps. The result is a significant improvement in contiguity over the existing draft state from earlier studies. Much of the basis for building an understanding of epigenetic mechanisms in non-model species centres around the study of DNA methylation, and in particular the analysis of bisulfite sequencing data to bring methylation patterns into nucleotide-level resolution. In order to maintain a broad level of comparison between T. arvense and the other selected species under the same initiative, a suite of software pipelines which include mapping, the quantification of methylation values, differential methylation between groups, and epigenome-wide association studies, have also been developed. Furthermore, presented herein is a novel algorithm which can facilitate accurate variant calling from bisulfite sequencing data using conventional approaches, such as FreeBayes or Genome Analysis ToolKit (GATK), which until now was feasible only with specifically-adapted software. This enables researchers to obtain high-quality genetic variants, often essential for contextualising the results of epigenetic experiments, without the need for additional sequencing libraries alongside. Each of these aspects are thoroughly benchmarked, integrated to a robust workflow management system, and adhere to the principles of FAIR (Findability, Accessibility, Interoperability and Reusability). Finally, further consideration is given to the unique difficulties presented by population-scale data, and a number of concepts and ideas are explored in order to improve the feasibility of such analyses. In summary, this thesis introduces new high-resolution tools to facilitate the analysis of epigenetic mechanisms, specifically relating to DNA methylation, in non-model plant data. In addition, thorough benchmarking standards are applied, showcasing the range of technical considerations which are of principal importance when developing new pipelines and tools for the analysis of bisulfite sequencing data. The complete “Epidiverse Toolkit” is available at https://github.com/EpiDiverse and will continue to be updated and improved in the future.:ABSTRACT ACKNOWLEDGEMENTS 1 INTRODUCTION 1.1 ABOUT THIS WORK 1.2 BIOLOGICAL BACKGROUND 1.2.1 Epigenetics in plant ecology 1.2.2 DNA methylation 1.2.3 Maintenance of 5mC patterns in plants 1.2.4 Distribution of 5mC patterns in plants 1.3 TECHNICAL BACKGROUND 1.3.1 DNA sequencing 1.3.2 The case for a high-quality genome assembly 1.3.3 Sequence alignment for NGS 1.3.4 Variant calling approaches 2 BUILDING A SUITABLE REFERENCE GENOME 2.1 INTRODUCTION 2.2 MATERIALS AND METHODS 2.2.1 Seeds for the reference genome development 2.2.2 Sample collection, library preparation, and DNA sequencing 2.2.3 Contig assembly and initial scaffolding 2.2.4 Re-scaffolding 2.2.5 Comparative genomics 2.3 RESULTS 2.3.1 An improved reference genome sequence 2.3.2 Comparative genomics 2.4 DISCUSSION 3 FEATURE ANNOTATION FOR EPIGENOMICS 3.1 INTRODUCTION 3.2 MATERIALS AND METHODS 3.2.1 Tissue preparation for RNA sequencing 3.2.2 RNA extraction and sequencing 3.2.3 Transcriptome assembly 3.2.4 Genome annotation 3.2.5 Transposable element annotations 3.2.6 Small RNA annotations 3.2.7 Expression atlas 3.2.8 DNA methylation 3.3 RESULTS 3.3.1 Transcriptome assembly 3.3.2 Protein-coding genes 3.3.3 Non-coding loci 3.3.4 Transposable elements 3.3.5 Small RNA 3.3.6 Pseudogenes 3.3.7 Gene expression atlas 3.3.8 DNA Methylation 3.4 DISCUSSION 4 BISULFITE SEQUENCING METHODS 4.1 INTRODUCTION 4.2 PRINCIPLES OF BISULFITE SEQUENCING 4.3 EXPERIMENTAL DESIGN 4.4 LIBRARY PREPARATION 4.4.1 Whole Genome Bisulfite Sequencing (WGBS) 4.4.2 Reduced Representation Bisulfite Sequencing (RRBS) 4.4.3 Target capture bisulfite sequencing 4.5 BIOINFORMATIC ANALYSIS OF BISULFITE DATA 4.5.1 Quality Control 4.5.2 Read Alignment 4.5.3 Methylation Calling 4.6 ALTERNATIVE METHODS 5 FROM READ ALIGNMENT TO DNA METHYLATION ANALYSIS 5.1 INTRODUCTION 5.2 MATERIALS AND METHODS 5.2.1 Reference species 5.2.2 Natural accessions 5.2.3 Read simulation 5.2.4 Read alignment 5.2.5 Mapping rates 5.2.6 Precision-recall 5.2.7 Coverage deviation 5.2.8 DNA methylation analysis 5.3 RESULTS 5.4 DISCUSSION 5.5 A PIPELINE FOR WGBS ANALYSIS 6 THERE AND BACK AGAIN: INFERRING GENOMIC INFORMATION 6.1 INTRODUCTION 6.1.1 Implementing a new approach 6.2 MATERIALS AND METHODS 6.2.1 Validation datasets 6.2.2 Read processing and alignment 6.2.3 Variant calling 6.2.4 Benchmarking 6.3 RESULTS 6.4 DISCUSSION 6.5 A PIPELINE FOR SNP VARIANT ANALYSIS 7 POPULATION-LEVEL EPIGENOMICS 7.1 INTRODUCTION 7.2 CHALLENGES IN POPULATION-LEVEL EPIGENOMICS 7.3 DIFFERENTIAL METHYLATION 7.3.1 A pipeline for case/control DMRs 7.3.2 A pipeline for population-level DMRs 7.4 EPIGENOME-WIDE ASSOCIATION STUDIES (EWAS) 7.4.1 A pipeline for EWAS analysis 7.5 GENOTYPING-BY-SEQUENCING (EPIGBS) 7.5.1 Extending the epiGBS pipeline 7.6 POPULATION-LEVEL HAPLOTYPES 7.6.1 Extending the EpiDiverse/SNP pipeline 8 CONCLUSION APPENDICES A. SUPPLEMENT: BUILDING A SUITABLE REFERENCE GENOME B. SUPPLEMENT: FEATURE ANNOTATION FOR EPIGENOMICS C. SUPPLEMENT: FROM READ ALIGNMENT TO DNA METHYLATION ANALYSIS D. SUPPLEMENT: INFERRING GENOMIC INFORMATION BIBLIOGRAPH

    WHY SCHIZOPHRENIA GENETICS NEEDS EPIGENETICS: A REVIEW

    Get PDF
    Schizophrenia (SZ) is a highly heritable disorder, with about 80% of the variance attributable to genetic factors. There is accumulating evidence that both common genetic variants with small effects and rare genetic lesions with large effects determine risk of SZ. As recently shown, thousands of common single nucleotide polymorphisms (SNPs), each with small effect, cumulatively could explain about 30% of the underlying genetic risk of SZ. On the other hand, rare and large copy number variants (CNVs) with high but incomplete penetrance, variable in different individual, could explain about additional 30% of SZ cases. Although these rare CNVs frequently develop de novo, it is not clear whether they affect risk independently or via interaction with a polygenic liability in the background. Finally, the role of environmental risk factors has been well established in SZ. Environmental factors are rarely sufficient to cause SZ independently, but act in parallel or in synergy with the underlying genetic liability. Epigenetic misregulation of the genome and direct CNS injury are probably the main mechanism to mediate prenatal environmental effects (e.g., viruses, ethanol, or nutritional deficiency) whereas postnatal risk factors (e.g., stress, urbanicity, cannabis use) may also affect risk via usebased potentiation of vulnerable CNS pathways implicated in SZ. In this review, we outline a general theoretical background of epigenetic mechanisms involved in GxE interactions, and then discuss epigenetic and neurodevelopmental features of SZ based on available information from genetics, epigenetics, epidemiology, neuroscience, and clinical research. We argue that epigenetic model of SZ provides a framework to integrate a variety of diverse empirical data into a powerful etiopathogenetic synthesis. The promising future of this model is the possibility to develop truly specific prevention and treatment strategies for SZ

    Genome-wide analysis of DNA methylation topology to understand cell fate

    Get PDF
    DNA methylation is an epignetic modification associated with gene regulation. It has extensively been studied in the context of small regulatory regions. Yet, not so much is known about large domains characterized by fuzzy methylation patterns, termed Partially Methylated Domains (PMDs). The present thesis comprises PMD analyses in various contexts and provides several new aspects to study DNA methylation. First, a comprehensive analysis of PMDs across a large cohort of WGBS samples was performed, to identify structural and functional features associated with PMDs. A newly developed approach, ChromH3M, was proposed for the analysis and integration of a large spectrum of WGBS data sets. Second, PMDs were found to be indicators of the cellular proliferation history and segmented loss of DNA methylation in PMDs supports the sequential linear differentiation model of memory T-cells. Third, assessment of genome-wide methylation changes in PMDs of Multiple Sclerosis-discordant monozygotic co-twins did not show significant differences, but local changes (DMRs) were identified. Taken together, the outcomes of the presented studies shed light on a so far neglected aspect of DNA methylation, that is PMDs, in different contexts; lineage specialization, differentiation, replication, disease, chromatin organization and gene expression.Die DNA-Methylierung ist eine epigenetische Modi1kation, die funktionell mit der Genregulation verbunden ist. Sie wurde bereits ausführlich im Kontext kleiner regulatorischer Regionen untersucht. Es ist jedoch noch nicht sehr viel bekannt über große Domänen, welche erstmals in WGBS-Daten beschrieben wurden. Sie werden als partiell methylierte Regionen (PMDs) bezeichnet und sind durch das Vorhandensein variabler Methylierungsmuster charakterisiert. Die vorliegende Arbeit umfasst PMD-Analysen in unterschiedlichen Kontexten und liefert verschiedene neue Aspekte zur Untersuchung der DNA-Methylierung. Zuerst wurde eine umfassende Analyse von PMDs in einer großen Kohorte von WGBS-Proben durchgeführt, um strukturelle und funktionelle Merkmale zu identi 1zieren, die mit PMDs assoziert sind. Ein neu entwickelter Ansatz, ChromH3M, wurde für die Analyse und Integration einer großen Kohorte vonWGBS Datensätzen angewandt. Zweitens wurde festgestellt, dass PMDs Indikatoren für die Zellproliferationshistorie sind, und der zu beobachtende graduelle Verlust der globalen DNAMethylierung bei der Differenzierung von T-Gedächtniszellen unterstützt die Hypothese der sequenziellen linearen Differenzierung. Drittens zeigte die Bewertung der genomweiten Methylierungsänderungen in PMDs von Multiple Sklerose-diskordanten monozygoten Zwillingen keine signi1kanten Unterschiede, jedoch wurden lokale Änderungen (DMRs) identi1ziert. Insgesamt geben die Ergebnisse der vorgestellten Studien Aufschluss über einen bislang eher vernachlässigten Aspekt der DNA-Methylierung, d.h. PMDs, in verschiedenen Zusammenhängen: der Festlegung der Zell-entwicklungsbahnen, der Zelldifferenzierung, der Replikation, die Krankheit, der Organisation des Chromatins, sowie der Regulation der Genexpression

    Epigenomic And Nuclear Architectural Insights Into Rett Syndrome

    Get PDF
    The importance of DNA methylation in neuronal function is highlighted by mutations in the neuronally enriched “reader” of DNA methylation, methyl-CpG-binding protein 2 (MECP2), causing Rett Syndrome (RTT), a severe neurodevelopmental disorder. Although MeCP2 displays broad genomic binding, gene expression changes in Mecp2 mutant mice are very subtle, and brain region-specific, making it difficult to determine how MeCP2 regulates gene expression. Therefore, we developed an approach to assess cell type-specific effects of Mecp2 mutations on the transcriptome, epigenome, and chromatin architecture to determine whether epigenomic features can explain gene misregulation in RTT. Differentially expressed genes (DEGs) in R106W Mecp2 mutants (R106W) are enriched for MeCP2 binding in the WT setting and are preferentially demethylated in R106W, suggesting that the loss of MeCP2 binding results in the exposure of unbound cytosines to demethylation, thus contributing to gene dysregulation. Given that DEGs are enriched for MeCP2 binding, we next determined unique features of DEGs to gain an understanding of why MeCP2 preferentially targets DEGs. We find that DEGs are cell type-specific, lowly expressed, and intragenically associated with heterochromatin, active enhancer, and CTCF chromatin states, suggesting that MeCP2 is essential for the regulation of lowly expressed genes. Upregulated and downregulated DEGs are differentially enriched for particular chromatin states, providing an insight into the directionality of gene dysregulation. Given the enrichment of DEGs for active enhancer and CTCF chromatin states, we next investigated transcription factor (TF) footprints and found thousands of altered TF footprints in R106W, with the CTCF motif being the most significantly associated. In WT, these sites are enriched for MeCP2 binding, and in R106W, these sites, which are associated with downregulated DEGs, become demethylated, enabling CTCF binding. This therefore suggests that MeCP2 can affect CTCF recruitment to chromatin. Given CTCF’s known role in chromatin organization, we employed Oligopaint and found large-scale condensation of euchromatin and heterochromatin, as well as decondensation of long genes. Together, this work provides insight into why DEGs are differentially susceptible to dysregulation in RTT and posits MeCP2 as a key player in global maintenance of the methylome and chromatin architecture for the preservation of neuronal gene expression

    Computational solutions for addressing heterogeneity in DNA methylation data

    Get PDF
    DNA methylation, a reversible epigenetic modification, has been implicated with various bi- ological processes including gene regulation. Due to the multitude of datasets available, it is a premier candidate for computational tool development, especially for investigating hetero- geneity within and across samples. We differentiate between three levels of heterogeneity in DNA methylation data: between-group, between-sample, and within-sample heterogeneity. Here, we separately address these three levels and present new computational approaches to quantify and systematically investigate heterogeneity. Epigenome-wide association studies relate a DNA methylation aberration to a phenotype and therefore address between-group heterogeneity. To facilitate such studies, which necessar- ily include data processing, exploratory data analysis, and differential analysis of DNA methy- lation, we extended the R-package RnBeads. We implemented novel methods for calculating the epigenetic age of individuals, novel imputation methods, and differential variability analysis. A use-case of the new features is presented using samples from Ewing sarcoma patients. As an important driver of epigenetic differences between phenotypes, we systematically investigated associations between donor genotypes and DNA methylation states in methylation quantitative trait loci (methQTL). To that end, we developed a novel computational framework –MAGAR– for determining statistically significant associations between genetic and epigenetic variations. We applied the new pipeline to samples obtained from sorted blood cells and complex bowel tissues of healthy individuals and found that tissue-specific and common methQTLs have dis- tinct genomic locations and biological properties. To investigate cell-type-specific DNA methylation profiles, which are the main drivers of within-group heterogeneity, computational deconvolution methods can be used to dissect DNA methylation patterns into latent methylation components. Deconvolution methods require pro- files of high technical quality and the identified components need to be biologically interpreted. We developed a computational pipeline to perform deconvolution of complex DNA methyla- tion data, which implements crucial data processing steps and facilitates result interpretation. We applied the protocol to lung adenocarcinoma samples and found indications of tumor in- filtration by immune cells and associations of the detected components with patient survival. Within-sample heterogeneity (WSH), i.e., heterogeneous DNA methylation patterns at a ge- nomic locus within a biological sample, is often neglected in epigenomic studies. We present the first systematic benchmark of scores quantifying WSH genome-wide using simulated and experimental data. Additionally, we created two novel scores that quantify DNA methyla- tion heterogeneity at single CpG resolution with improved robustness toward technical biases. WSH scores describe different types of WSH in simulated data, quantify differential hetero- geneity, and serve as a reliable estimator of tumor purity. Due to the broad availability of DNA methylation data, the levels of heterogeneity in DNA methylation data can be comprehensively investigated. We contribute novel computational frameworks for analyzing DNA methylation data with respect to different levels of hetero- geneity. We envision that this toolbox will be indispensible for understanding the functional implications of DNA methylation patterns in health and disease.DNA Methylierung ist eine reversible, epigenetische Modifikation, die mit verschiedenen biologischen Prozessen wie beispielsweise der Genregulation in Verbindung steht. Eine Vielzahl von DNA Methylierungsdatensätzen bildet die perfekte Grundlage zur Entwicklung von Softwareanwendungen, insbesondere um Heterogenität innerhalb und zwischen Proben zu beschreiben. Wir unterscheiden drei Ebenen von Heterogenität in DNA Methylierungsdaten: zwischen Gruppen, zwischen Proben und innerhalb einer Probe. Hier betrachten wir die drei Ebenen von Heterogenität in DNA Methylierungsdaten unabhängig voneinander und präsentieren neue Ansätze um die Heterogenität zu beschreiben und zu quantifizieren. Epigenomweite Assoziationsstudien verknüpfen eine DNA Methylierungsveränderung mit einem Phänotypen und beschreiben Heterogenität zwischen Gruppen. Um solche Studien, welche Datenprozessierung, sowie exploratorische und differentielle Datenanalyse beinhalten, zu vereinfachen haben wir die R-basierte Softwareanwendung RnBeads erweitert. Die Erweiterungen beinhalten neue Methoden, um das epigenetische Alter vorherzusagen, neue Schätzungsmethoden für fehlende Datenpunkte und eine differentielle Variabilitätsanalyse. Die Analyse von Ewing-Sarkom Patientendaten wurde als Anwendungsbeispiel für die neu entwickelten Methoden gewählt. Wir untersuchten Assoziationen zwischen Genotypen und DNA Methylierung von einzelnen CpGs, um sogenannte methylation quantitative trait loci (methQTL) zu definieren. Diese stellen einen wichtiger Faktor dar, der epigenetische Unterschiede zwischen Gruppen induziert. Hierzu entwickelten wir ein neues Softwarepaket (MAGAR), um statistisch signifikante Assoziationen zwischen genetischer und epigenetischer Variation zu identifizieren. Wir wendeten diese Pipeline auf Blutzelltypen und komplexe Biopsien von gesunden Individuen an und konnten gemeinsame und gewebespezifische methQTLs in verschiedenen Bereichen des Genoms lokalisieren, die mit unterschiedlichen biologischen Eigenschaften verknüpft sind. Die Hauptursache für Heterogenität innerhalb einer Gruppe sind zelltypspezifische DNA Methylierungsmuster. Um diese genauer zu untersuchen kann Dekonvolutionssoftware die DNA Methylierungsmatrix in unabhängige Variationskomponenten zerlegen. Dekonvolutionsmethoden auf Basis von DNA Methylierung benötigen technisch hochwertige Profile und die identifizierten Komponenten müssen biologisch interpretiert werden. In dieser Arbeit entwickelten wir eine computerbasierte Pipeline zur Durchführung von Dekonvolutionsexperimenten, welche die Datenprozessierung und Interpretation der Resultate beinhaltet. Wir wendeten das entwickelte Protokoll auf Lungenadenokarzinome an und fanden Anzeichen für eine Tumorinfiltration durch Immunzellen, sowie Verbindungen zum Überleben der Patienten. Heterogenität innerhalb einer Probe (within-sample heterogeneity, WSH), d.h. heterogene Methylierungsmuster innerhalb einer Probe an einer genomischen Position, wird in epigenomischen Studien meist vernachlässigt. Wir präsentieren den ersten Vergleich verschiedener, genomweiter WSH Maße auf simulierten und experimentellen Daten. Zusätzlich entwickelten wir zwei neue Maße um WSH für einzelne CpGs zu berechnen, welche eine verbesserte Robustheit gegenüber technischen Faktoren aufweisen. WSH Maße beschreiben verschiedene Arten von WSH, quantifizieren differentielle Heterogenität und sagen Tumorreinheit vorher. Aufgrund der breiten Verfügbarkeit von DNA Methylierungsdaten können die Ebenen der Heterogenität ganzheitlich beschrieben werden. In dieser Arbeit präsentieren wir neue Softwarelösungen zur Analyse von DNA Methylierungsdaten in Bezug auf die verschiedenen Ebenen der Heterogenität. Wir sind davon überzeugt, dass die vorgestellten Softwarewerkzeuge unverzichtbar für das Verständnis von DNA Methylierung im kranken und gesunden Stadium sein werden

    Early detection of colorectal cancer: biomarker discovery

    Get PDF
    Colorectal cancer (CRC) is the third most common cancer worldwide, with about 1.2 million new cases diagnosed each year. CRC derived from the gradual accumulation of both genetic and epigenetic changes that transform the normal intestinal glandular epithelium into invasive cancer. While the genetic alterations are already used as prognostic and predictive markers, epigenetic alterations are currently the subject of intense research in the biomedical field because are considered as common and early molecular events in carcinogenesis that potentially could be used as molecular markers. The aims of this study were: to identify the alterations that characterize the CRC methylome; verify that these changes represent early events in the development of CRCs; explore the use of ultra-sensitive molecular techniques to track these alterations in biological matrices suitable for a non-invasive assessment (blood and stool); correlate the methylation alterations with the associated genes expression. The methylome analysis, conducted by Infinium HumanMethylation450 BeadChip on CRC and adenoma samples, has allowed us to delineate both the CRC methylation profile and that associated with precancerous stages. The gene-set/pathway enrichment analysis conducted by Toppgene and based on case/control differential methylation analysis results of CRCs and adenomas, allowed the identification of pathways involved in CRC carcinogenesis. The contribution of these pathways had never been widely emphasized and discussed in the literature. A very important result, emerged from the comparison of the genes belonging to the most altered significant pathways both in CRCs and adenomas, has been the identification of methylation alterations of regions, known as CpG islands, since the earliest stages of precancerous lesion suggesting that the alteration of specific pathways can lead the tumorigenic process. The selection of these regions has allowed us to identify a panel of biomarkers that can discriminate, with high specificity and sensitivity, CRCs and adenomas from peritumoral / normal counterpart. This panel has been extensively validated in silico in over 600 samples. We also evaluated the gene expression associated with these regions; more than 70% of hypermethylated CpG islands correlated with a downregulation in tumor tissue. To evaluate the usefulness of these biomarkers as a potential tool for non-invasive early diagnosis of CRC in clinical practice, we tried to trace through the use of ultra-sensitive techniques (methyl_BEAMING), the hypermethylation of three selected biomarkers in DNA extract from blood and stool. The hypermethylation of these regions, due to the presence of tumoral DNA, has been traced with great sensitivity and specificity in both matrices confirming the usefulness of these regions as possible biomarkers for the early diagnosis and traceability of residual disease of CRC

    Studies on genetic and epigenetic regulation of gene expression dynamics

    Get PDF
    The information required to build an organism is contained in its genome and the first biochemical process that activates the genetic information stored in DNA is transcription. Cell type specific gene expression shapes cellular functional diversity and dysregulation of transcription is a central tenet of human disease. Therefore, understanding transcriptional regulation is central to understanding biology in health and disease. Transcription is a dynamic process, occurring in discrete bursts of activity that can be characterized by two kinetic parameters; burst frequency describing how often genes burst and burst size describing how many transcripts are generated in each burst. Genes are under strict regulatory control by distinct sequences in the genome as well as epigenetic modifications. To properly study how genetic and epigenetic factors affect transcription, it needs to be treated as the dynamic cellular process it is. In this thesis, I present the development of methods that allow identification of newly induced gene expression over short timescales, as well as inference of kinetic parameters describing how frequently genes burst and how many transcripts each burst give rise to. The work is presented through four papers: In paper I, I describe the development of a novel method for profiling newly transcribed RNA molecules. We use this method to show that therapeutic compounds affecting different epigenetic enzymes elicit distinct, compound specific responses mediated by different sets of transcription factors already after one hour of treatment that can only be detected when measuring newly transcribed RNA. The goal of paper II is to determine how genetic variation shapes transcriptional bursting. To this end, we infer transcriptome-wide burst kinetics parameters from genetically distinct donors and find variation that selectively affects burst sizes and frequencies. Paper III describes a method for inferring transcriptional kinetics transcriptome-wide using single-cell RNA-sequencing. We use this method to describe how the regulation of transcriptional bursting is encoded in the genome. Our findings show that gene specific burst sizes are dependent on core promoter architecture and that enhancers affect burst frequencies. Furthermore, cell type specific differential gene expression is regulated by cell type specific burst frequencies. Lastly, Paper IV shows how transcription shapes cell types. We collect data on cellular morphologies, electrophysiological characteristics, and measure gene expression in the same neurons collected from the mouse motor cortex. Our findings show that cells belonging to the same, distinct transcriptomic families have distinct and non-overlapping morpho-electric characteristics. Within families, there is continuous and correlated variation in all modalities, challenging the notion of cell types as discrete entities
    • …
    corecore