15 research outputs found

    Detecting broad domains and narrow peaks in ChIP-seq data with hiddenDomains

    Get PDF
    Abstract Background Correctly identifying genomic regions enriched with histone modifications and transcription factors is key to understanding their regulatory and developmental roles. Conceptually, these regions are divided into two categories, narrow peaks and broad domains, and different algorithms are used to identify each one. Datasets that span these two categories are often analyzed with a single program for peak calling combined with an ad hoc method for domains. Results We developed hiddenDomains, which identifies both peaks and domains, and compare it to the leading algorithms using H3K27me3, H3K36me3, GABP, ESR1 and FOXA ChIP-seq datasets. The output from the programs was compared to qPCR-validated enriched and depleted sites, predicted transcription factor binding sites, and highly-transcribed gene bodies. With every method, hiddenDomains, performed as well as, if not better than algorithms dedicated to a specific type of analysis. Conclusions hiddenDomains performs as well as the best domain and peak calling algorithms, making it ideal for analyzing ChIP-seq datasets, especially those that contain a mixture of peaks and domains

    lncRNA-Induced Spread of Polycomb Controlled by Genome Architecture, RNA Abundance, and CpG Island DNA

    Get PDF
    Long noncoding RNAs (lncRNAs) cause Polycomb repressive complexes (PRCs) to spread over broad regions of the mammalian genome. We report that in mouse trophoblast stem cells, the Airn and Kcnq1ot1 lncRNAs induce PRC-dependent chromatin modifications over multi-megabase domains. Throughout the Airn-targeted domain, the extent of PRC-dependent modification correlated with intra-nuclear distance to the Airn locus, preexisting genome architecture, and the abundance of Airn itself. Specific CpG islands (CGIs) displayed characteristics indicating that they nucleate the spread of PRCs upon exposure to Airn. Chromatin environments surrounding Xist, Airn, and Kcnq1ot1 suggest common mechanisms of PRC engagement and spreading. Our data indicate that lncRNA potency can be tightly linked to lncRNA abundance and that within lncRNA-targeted domains, PRCs are recruited to CGIs via lncRNA-independent mechanisms. We propose that CGIs that autonomously recruit PRCs interact with lncRNAs and their associated proteins through three-dimensional space to nucleate the spread of PRCs in lncRNA-targeted domains. Schertzer et al. studied relationships between long noncoding RNAs (lncRNAs) and Polycomb repressive complexes (PRCs) in mouse trophoblast stem cells. They found that genome architecture, lncRNA abundance, and CpG island DNA each play important roles in dictating the intensity of PRC-induced chromatin modifications within lncRNA target domains

    Bioinformatics services for analyzing massive genomic datasets

    Get PDF
    The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating down-stream analysis of genome data. Bio-Express web service is freely available at https://www. bioexpress.re.kr/. ?? 2020, Korea Genome Organization

    Multiple modes of PRC2 inhibition elicit global chromatin alterations in H3K27M pediatric glioma

    Get PDF
    A methionine substitution at lysine-27 on histone H3 variants (H3K27M) characterizes ~80% of diffuse intrinsic pontine gliomas (DIPG) and inhibits polycomb repressive complex 2 (PRC2) in a dominant-negative fashion. Yet, the mechanisms for this inhibition and abnormal epigenomic landscape have not been resolved. Using quantitative proteomics, we discovered that robust PRC2 inhibition requires levels of H3K27M greatly exceeding those of PRC2, seen in DIPG. While PRC2 inhibition requires interaction with H3K27M, we found that this interaction on chromatin is transient, with PRC2 largely being released from H3K27M. Unexpectedly, inhibition persisted even after PRC2 dissociated from H3K27M-containing chromatin, suggesting a lasting impact on PRC2. Furthermore, allosterically activated PRC2 is particularly sensitive to H3K27M, leading to the failure to spread H3K27me from PRC2 recruitment sites and consequently abrogating PRC2's ability to establish H3K27me2-3 repressive chromatin domains. In turn, levels of polycomb antagonists such as H3K36me2 are elevated, suggesting a more global, downstream effect on the epigenome. Together, these findings reveal the conditions required for H3K27M-mediated PRC2 inhibition and reconcile seemingly paradoxical effects of H3K27M on PRC2 recruitment and activity

    CONTROL OF POLYCOMB BY CIS-REPRESSIVE LONG NON-CODING RNAS

    Get PDF
    Cis-repressive long non-coding RNAs (lncRNAs) spread Polycomb Repressive Complexes (PRCs) within specific genomic regions to achieve chromatin compaction and stable gene silencing. Xist is the best characterized lncRNA; it is required to spread PRCs and silence genes across the entire 165 Mb X chromosome. Despite decades of research using the Xist lncRNA as a model, the relationship between lncRNAs and PRCs remains unclear. Airn and Kcnq1ot1 lncRNAs function similarly to Xist, but in smaller genomic regions. In this dissertation, we gained novel insights into lncRNA and PRC mechanism by comparing and contrasting lncRNA features and the genomic environments of Xist, Airn, and Kcnq1ot1. First, we found that Airn and Kcnq1ot1 spread PRCs and silence genes across multi megabase domains in mouse trophoblast stem cells (TSCs). Similar to the X chromosome, Airn and Kcnq1ot1 targeted regions contained non-uniform patterns of PRCs. We showed that PRC density in the 13 Mb Airn target region correlated with Airn abundance and was dependent on multiple aspects of genome architecture: linear distance to the Airn locus, pre-existing structure, TAD boundaries, and high-affinity chromatin sites of Airn. In Airn overexpression TSCs, eight PRC-bound CpG islands (CGIs) appeared to nucleate the spread of Polycomb. Deletion of one 2kb CGI caused loss of Polycomb across 4.5 Mb. Xist and Kcnq1ot1 targeted regions showed similar patterns of Polycomb at PRC-bound CGIs. This suggests a common mechanism where lncRNAs depend on pre-bound CGIs to specifically target and spread Polycomb in cis.Doctor of Philosoph

    Two Contrasting Classes of Nucleolus-Associated Domains in Mouse Fibroblast Heterochromatin [preprint]

    Get PDF
    In interphase eukaryotic cells, almost all heterochromatin is located adjacent to the nucleolus or to the nuclear lamina, thus defining Nucleolus Associated Domains (NADs) and Lamina Associated Domains (LADs), respectively. Here, we determined the first genome-scale map of murine NADs in mouse embryonic fibroblasts (MEFs) via deep sequencing of chromatin associated with purified nucleoli. We developed a Bioconductor package called NADfinder and demonstrated that it identifies NADs more accurately than other peak-calling tools, due to its critical feature of chromosome-level local baseline correction. We detected two distinct classes of NADs. Type I NADs associate frequently with both the nucleolar periphery and with the nuclear lamina, and generally display characteristics of constitutive heterochromatin, including late DNA replication, enrichment of H3K9me3 and little gene expression. In contrast, Type II NADs associate with nucleoli but do not overlap with LADs. Type II NADs tend to replicate earlier, display greater gene expression, and are more often enriched in H3K27me3 than Type I NADs. The nucleolar associations of both classes of NADs were confirmed via DNA-FISH, which also detected Type I but not Type II probes enriched at the nuclear lamina. Interestingly, Type II NADs are enriched in distinct gene classes, notably factors important for differentiation and development. In keeping with this, we observed that a Type II NAD is developmentally regulated, present in MEFs but not in undifferentiated embryonic stem (ES) cells

    EZH2 variants differentially regulate polycomb repressive complex 2 in histone methylation and cell differentiation

    Get PDF
    Abstract Background Polycomb repressive complex 2 (PRC2) is responsible for establishing and maintaining histone H3K27 methylation during cell differentiation and proliferation. H3K27 can be mono-, di-, or trimethylated, resulting in differential gene regulation. However, it remains unknown how PRC2 specifies the degree and biological effects of H3K27 methylation within a given cellular context. One way to determine PRC2 specificity may be through alternative splicing of Ezh2, PRC2’s catalytic subunit, during cell differentiation and tissue maturation. Results We fully characterized the alternative splicing of Ezh2 in somatic cells and male germ cells and found that Ezh’s exon 14 was differentially regulated during mitosis and meiosis. The Ezh2 isoform containing exon 14 (ex14-Ezh2) is upregulated during cell cycle progression, consistent with a role in maintaining H3K27 methylation during chromatin replication. In contrast, the isoform lacking exon 14 (ex14D-Ezh2) was almost exclusively present in spermatocytes when new H3K27me2 is established during meiotic differentiation. Moreover, Ezh2’s transcript is normally controlled by E2F transcription activators, but in spermatocytes, Ezh2’s transcription is controlled by the meiotic regulator MYBL1. Compared to ex14-EZH2, ex14D-EZH2 has a diminished efficiency for catalyzing H3K27me3 and promotes embryonic stem cell differentiation. Conclusions Ezh2’s expression is regulated at transcriptional and post-transcriptional levels in a cellular context-dependent manner. EZH2 variants determine functional specificity of PRC2 in histone methylation during cell proliferation and differentiation

    Exploring the universe of single cells using multi-omic approaches

    Get PDF
    In both unicellular and multicellular organisms, no individual cell is completely the same. The tissues of humans and other animals are composed of various cell types, each with a different function. Even within the population of a specific type of cell, all cells differ from one another on multiple molecular levels. This variation is constantly introduced in cells, making them more unique step by step. Variation can be introduced during cell division through for example DNA mutations or changes in the structure or the packaging of DNA. Diseases such as cancer can start in an individual cell when it quickly diverges from its predecessors through cell divisions and gives rise to a tumor. Chromosome loss in colorectal cancer The second chapter provides more insight into various modalities of individual cells. We use organoid models of colorectal cancer, which mimic the disease in tiny 3D structures grown in a dish. With NLAIII-seq for whole genome sequencing and viral barcoding for lineage tracing we perform and read out multiple measurements in individual cells to reconstruct the order in which chromosomal abnormalities occur in this type of cancer. Because of the integrated measurements, we are able to measure changes that occur in parallel in different cells within one population. Using this method, we found a recurring loss of chromosome 4 that only occurs after the loss of chromosome 18. These findings coincide with clinical observations in patients with colorectal cancer. Hematopoiesis In the third chapter we introduce a novel technique called sort-ChIC. This technique measures histone modifications in individual cells. Modifications in these molecules influence which parts of the DNA in that cell can be read and thereby influence the proteins that the cell produces. We apply sort-ChIC in two active (H3K4me1 and H3K4me3) and two repressive (H3K27me3 and H3K9me3) histone modifications on blood stem cells (HSPCs) and adult blood cells in the bone marrow of the mouse to gain insight into the histone modifications that occur during blood formation. Joint profiling of H3K4me1 and H3K9me3 demonstrates that cell types within the myeloid lineage have distinct active chromatin but share similar myeloid-specific heterochromatin repressed states. This suggests hierarchical chromatin regulation during hematopoiesis. State-of-the-art technology The fourth chapter introduces a new technique, scChIC-TAPS, that can measure various modalities – including histone marks and DNA structure – simultaneously in an individual cell. Our approach combines bisulfite-free conversion of methylated cytosines and targeted MNase digestion. With these integrated measurements we can resolve the local correlations of different histone modifications and DNA methylation states at base-pair resolution. We describe the validation of scChIC-TAPS and its application in the Fucci cell line, a system in which the position of every cell in the cell cycle can be measured precisely. We use this cell cycle information to integrate the data of multiple histone marks and compare their behavior throughout the cell cycle. Our data provides the first direct evidence that kinetics of replication-coupled methylation are influenced by the local chromatin environment

    Développement de nouveaux outils pour l'intégration des données du ChIP-Seq et leurs applications pour l'étude du contrÎle de la transcription

    Get PDF
    Les progrĂšs fulgurants des technologies de sĂ©quençage permettent de dĂ©velopper des projets de recherche trĂšs complexes. De plus, les consortiums internationaux tels qu’ENCODE, Roadmap Epigenomics et Fantom offrent publiquement de vastes jeux de donnĂ©s Ă  la communautĂ© scientifique. Ainsi, mon projet de recherche au doctorat a pour but de dĂ©velopper de nouvelles approches bioinformatiques afin d’analyser efficacement les donnĂ©es gĂ©nomiques de type ChIP-Seq pour cibler les changements dans les patrons d’interactions entre les protĂ©ines et l’ADN. De nouveaux outils R tels ENCODExplorer et FantomTSS ont donc Ă©tĂ© dĂ©veloppĂ©s afin de faciliter l’intĂ©gration des donnĂ©es publiques. De plus, l’outil metagene, dĂ©veloppĂ© dans le cadre de mon doctorat, permet de comparer les patrons d’enrichissement des protĂ©ines interagissant avec l’ADN. Il extrait efficacement la couverture des rĂ©gions gĂ©nomiques, normalise le signal et d’utilise les contrĂŽles pour retirer le bruit de fond. Il produit des graphiques pour comparer visuellement les facteurs et conditions et offre des outils statistiques pour cibler les profils significativement diffĂ©rents. Afin de valider mon approche expĂ©rimentale, j’ai analysĂ© une centaine de jeux de donnĂ©es de ChIP-Seq de la lignĂ©e GM12878 pour Ă©tudier les profils d’enrichissement au niveau des amplificateurs et des promoteurs en fonction de leur activitĂ© transcriptionnelle. Cette Ă©tude a ciblĂ© deux modes de recrutement distincts, soit l’effet gradient et l’effet seuil. Face Ă  la complexitĂ© et la quantitĂ© de donnĂ©es disponibles, il est essentiel de dĂ©velopper de nouvelles approches mĂ©thodologiques et statistiques afin d’amĂ©liorer notre comprĂ©hension des mĂ©canismes biologiques. ENCODExplorer et metagene sont disponibles sur Bioconductor.Recent progress in sequencing technologies opened the possibility of performing very complex research experiments. Combined with the vast public datasets produced by intenational consortiums such as ENCODE, Roadmap Epigenomics and Fantoms, the amount of data to process can be daunting. The goal of my doctoral project is to develop new bioinformatic approaches to facilitate the integration of ChIP-Seq data for the study of the dynamic of the interactions between proteins and DNA. New tools such as ENCODExplorer and FantomTSS were developped in R to make the publicly available datasets easier to integrate. Futhermore, the metagene package allows the comparison of enrichment patterns of DNA-interacting proteins. This package efficiently extracts read coverage from genomic regions of interest, normalize the signal and uses controls to remove background noise. The main functionnality of the metagene package is to visually compare enrichment profiles from multiple groups of genomic regions and to offer statistical tools to caracterize and compare those profiles. To validate my experimental approach, I used over a hundred datasets from the GM12878 cell line produced by the ENCODE consortium to study the enrichment profiles of transcription factors and histones in enhnacer and promoter regions. I was able to define two distinct recruitment patterns: the gradient effect and the threshold effect. With the ever growing complexity of genomic datasets, it is essential to develop new methodotical approaches to allow a better understanding of the underlying biological processes. ENCODExplorer and metagene are both available on Bioconductor
    corecore