14 research outputs found
Dynamics of bivalent chromatin during development in mammals
Mammalian cell types and tissues have diverse functional roles within an organism but
can be derived by the differentiation of the embryonic stem cells (ESCs). ESCs are pluripotent
cells with self-renewal properties. During development subsets of genes in ESCs are activated
or silenced for manifestation of the cell type specific function. Gene expression changes occur
transiently in early developmental stages, through signals received and executed by a variety
of transcription factors (TFs), regulatory elements (promoters, enhancers) and epigenetic
modifications of chromatin.
Post-translational modifications of the histone tails are regulated by chromatin modifiers
and transform the chromatin architecture. Polycomb (PcG) and Trithorax (TrxG) group
proteins are the most commonly studied histone modifiers. They were first discovered as
repressors (H3K27me3) and activators (H3K4me3) respectively of Homeobox (Hox) genes in
Drosophila and they are conserved in mammals. Bivalent chromatin is defined as the
simultaneous presence of silencing (H3K27me3) and activating (H3K4me3) histone marks and
was first discovered as a feature of many developmental gene promoters of ESCs. Bivalent
promoters are thought to be in a ‘poised’ state for later activation or repression during
differentiation due to the presence of the two counter-acting histone modifications and a
pausing variant of RNA polymerase II (RNAPII) accompanied with intermediate-low levels of
expression.
By integrative analysis of publicly available ChIP sequencing (ChIP-seq) datasets in
murine and human ESCs, we predicted 3,659 and 4,979 high–confidence (HC) bivalent
promoters in mouse and human ESCs respectively. Using a peak-based method, we acquire a
set of bivalent promoters with high enrichment for developmental regulators. Over 85% of
Polycomb targets were bivalent and their expression was particularly sensitive to TF
perturbation. Moreover, murine HC bivalent promoters were occupied by both Polycomb
repressive component classes (PRC1 and PRC2) and grouped into four distinct clusters with
different biological functions. HC bivalent and active promoters were CpG rich while
H3K27me3-only promoters lacked CpG islands. Binding enrichment of distinct sets of
regulators distinguished bivalent from active promoters and a ‘TCCCC’ sequence motif was
specifically enriched in bivalent promoters.
Using the recent technology of single cell RNA sequencing (scRNA-seq) we focused on
gene expression heterogeneity and how it may affect the output of differentiation. We collected
single cell gene expression profiles for 32 human and 39 murine ESCs and studied the
correlation between diverse characteristics such as network connectivity and coefficient of
variation (CV) across single cells. We further characterized properties unique to genes with
high CV. Highly expressed genes tended to have a low CV and were enriched for cell cycle
genes. In contrast, High CV genes were co-expressed with other High CV genes, were enriched
for bivalent promoters and showed enrichment for response to DNA damage and DNA repair.
Bivalent promoters in ESCs grouped in four distinct classes of variable biological
functions according to Polycomb occupancy and three RNAPII variants. To study the dynamics
of epigenetic and transcription control at promoters during development, we collected ChIPseq
data for two chromatin modifications (H3K4me3 and H3K27me3) and RNAPII (8WG16
antibody) as well as expression data (RNA-seq) across 8 cell types (ESCs and seven committed
cell types) in mouse. Hierarchical clustering of 22,179 unique gene promoters across cell types,
showed that H3K4me3 peaks are in agreement with the expression data while H3K27me3 and
RNAPII peaks were not highly consistent with the hierarchical tree of gene expression.
Unsupervised clustering of ChIP-seq and RNA-seq profiles has resulted in 31 distinct profiles,
which were subsequently narrowed down to nine major profile groups across cell types. TF
enrichment at individual clusters using ChIP sequencing data did not fully agree with the
classification of 8 major profile groups.
Considering all the above results, three major epigenetic profiles (active, bivalent and
latent) seem to be conserved across the species and cell types in our study. These states could
recapitulate only a fraction of the transcriptional information - adding other chromatin marks
could enrich it - since they are seemingly unaffected by their respective expression profiles.
H3K27me3 only state has low CpG density and shows stronger signatures at differentiated cell
types. Transcriptional control is tighter in active than bivalent promoters and the different
occupancy levels of PcG subunits and RNAPII can be reflected at the expression variance of
bivalent genes, where a fraction of them are involved in developmental functions while others
are more tissue-specific. Last, there is a striking similarity in the pausing patterns of RNAPII
in the progenitor cell types, which suggests that RNAPII pausing is correlated with the
developmental potential of the cell type.
Finally, this analysis will serve as a resource for future studies to further understand
transcriptional regulation during development
CpG island erosion, polycomb occupancy and sequence motif enrichment at bivalent promoters in mammalian embryonic stem cells
In embryonic stem (ES) cells, developmental regulators have a characteristic bivalent chromatin signature marked by simultaneous presence of both activation (H3K4me3) and repression (H3K27me3) signals and are thought to be in a 'poised' state for subsequent activation or silencing during differentiation. We collected eleven pairs (H3K4me3 and H3K27me3) of ChIP sequencing datasets in human ES cells and eight pairs in murine ES cells, and predicted high-confidence (HC) bivalent promoters. Over 85% of H3K27me3 marked promoters were bivalent in human and mouse ES cells. We found that (i) HC bivalent promoters were enriched for developmental factors and were highly likely to be differentially expressed upon transcription factor perturbation; (ii) murine HC bivalent promoters were occupied by both polycomb repressive component classes (PRC1 and PRC2) and grouped into four distinct clusters with different biological functions; (iii) HC bivalent and active promoters were CpG rich while H3K27me3-only promoters lacked CpG islands. Binding enrichment of distinct sets of regulators distinguished bivalent from active promoters. Moreover, a 'TCCCC' sequence motif was specifically enriched in bivalent promoters. Finally, this analysis will serve as a resource for future studies to further understand transcriptional regulation during embryonic development
Heat*seq:an interactive web tool for high-throughput sequencing experiment comparison with public data
Better protocols and decreasing costs have made high-throughput sequencing experiments now accessible even to small experimental laboratories. However, comparing one or few experiments generated by an individual lab to the vast amount of relevant data freely available in the public domain might be limited due to lack of bioinformatics expertise. Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments chromatin immuno-precipitation followed by sequencing, RNA-sequencing and Cap Analysis of Gene Expression) provided by a user, to the data in the public domain. Heat*seq currently contains over 12Â 000 experiments across diverse tissues and cell types in human, mouse and drosophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contextualize user experiments. High quality figures and tables are produced and can be downloaded in multiple formats
Variable reproducibility in genome-scale public data:A case study using ENCODE ChIP sequencing resource
Genome-wide data is accumulating in an unprecedented way in the public domain. Re-mining this data shows great potential to generate novel hypotheses. However this approach is dependent on the quality (technical and biological) of the underlying data. Here we performed a systematic analysis of chromatin immunoprecipitation (ChIP) sequencing data of transcription and epigenetic factors from the encyclopaedia of DNA elements (ENCODE) resource to demonstrate that about one third of conditions with replicates show low concordance between replicate peak lists. This serves as a case study to demonstrate a caveat concerning genome-wide analyses and highlights a need to validate the quality of each sample before performing further associative analyses
Gene expression variability in mammalian embryonic stem cells using single cell RNA-seq data
AbstractBackgroundGene expression heterogeneity contributes to development as well as disease progression. Due to technological limitations, most studies to date have focused on differences in mean expression across experimental conditions, rather than differences in gene expression variance. The advent of single cell RNA sequencing has now made it feasible to study gene expression heterogeneity and to characterise genes based on their coefficient of variation.MethodsWe collected single cell gene expression profiles for 32 human and 39 mouse embryonic stem cells and studied correlation between diverse characteristics such as network connectivity and coefficient of variation (CV) across single cells. We further systematically characterised properties unique to High CV genes.ResultsHighly expressed genes tended to have a low CV and were enriched for cell cycle genes. In contrast, High CV genes were co-expressed with other High CV genes, were enriched for bivalent (H3K4me3 and H3K27me3) marked promoters and showed enrichment for response to DNA damage and DNA repair.ConclusionsTaken together, this analysis demonstrates the divergent characteristics of genes based on their CV. High CV genes tend to form co-expression clusters and they explain bivalency at least in part
Investigating resistance in clinical Mycobacterium tuberculosis complex isolates with genomic and phenotypic antimicrobial susceptibility testing: a multicentre observational study.
BACKGROUND: Whole-genome sequencing (WGS) of Mycobacterium tuberculosis complex has become an important tool in diagnosis and management of drug-resistant tuberculosis. However, data correlating resistance genotype with quantitative phenotypic antimicrobial susceptibility testing (AST) are scarce. METHODS: In a prospective multicentre observational study, 900 clinical M tuberculosis complex isolates were collected from adults with drug-resistant tuberculosis in five high-endemic tuberculosis settings around the world (Georgia, Moldova, Peru, South Africa, and Viet Nam) between Dec 5, 2014, and Dec 12, 2017. Minimum inhibitory concentrations (MICs) and resulting binary phenotypic AST results for up to nine antituberculosis drugs were determined and correlated with resistance-conferring mutations identified by WGS. FINDINGS: Considering WHO-endorsed critical concentrations as reference, WGS had high accuracy for prediction of resistance to isoniazid (sensitivity 98·8% [95% CI 98·5-99·0]; specificity 96·6% [95% CI 95·2-97·9]), levofloxacin (sensitivity 94·8% [93·3-97·6]; specificity 97·1% [96·7-97·6]), kanamycin (sensitivity 96·1% [95·4-96·8]; specificity 95·0% [94·4-95·7]), amikacin (sensitivity 97·2% [96·4-98·1]; specificity 98·6% [98·3-98·9]), and capreomycin (sensitivity 93·1% [90·0-96·3]; specificity 98·3% [98·0-98·7]). For rifampicin, pyrazinamide, and ethambutol, the specificity of resistance prediction was suboptimal (64·0% [61·0-67·1], 83·8% [81·0-86·5], and 40·1% [37·4-42·9], respectively). Specificity for rifampicin increased to 83·9% when borderline mutations with MICs overlapping with the critical concentration were excluded. Consequently, we highlighted mutations in M tuberculosis complex isolates that are often falsely identified as susceptible by phenotypic AST, and we identified potential novel resistance-conferring mutations. INTERPRETATION: The combined analysis of mutations and quantitative phenotypes shows the potential of WGS to produce a refined interpretation of resistance, which is needed for individualised therapy, and eventually could allow differential drug dosing. However, variability of MIC data for some M tuberculosis complex isolates carrying identical mutations also reveals limitations of our understanding of the genotype and phenotype relationships (eg, including epistasis and strain genetic background). FUNDING: Bill & Melinda Gates Foundation, German Centre for Infection Research, German Research Foundation, Excellence Cluster Precision Medicine of Inflammation (EXC 2167), and Leibniz ScienceCampus EvoLUNG
Dynamics of promoter bivalency and RNAP II pausing in mouse stem and differentiated cells
Mammalian embryonic stem cells display a unique epigenetic and transcriptional state to facilitate pluripotency by maintaining lineage-specification genes in a poised state. Two epigenetic and transcription processes involved in maintaining poised state are bivalent chromatin, characterized by the simultaneous presence of activating and repressive histone methylation marks, and RNA polymerase II (RNAPII) promoter proximal pausing. However, the dynamics of histone modifications and RNAPII at promoters in diverse cellular contexts remains underexplored.
We collected genome wide data for bivalent chromatin marks H3K4me3 and H3K27me3, and RNAPII (8WG16) occupancy together with expression profiling in eight different cell types, including ESCs, in mouse. The epigenetic and transcription profiles at promoters grouped in over thirty clusters with distinct functional identities and transcription control. The clustering analysis identified distinct bivalent clusters where genes in one cluster retained bivalency across cell types while in the other were mostly cell type specific, but neither showed a high RNAPII pausing. We noted that RNAPII pausing is more associated with active genes than bivalent genes in a cell type, and was globally reduced in differentiated cell types compared to multipotent