40 research outputs found

    Basal core promoters control the equilibrium between negative cofactor 2 and preinitiation complexes in human cells

    Get PDF
    BACKGROUND: The general transcription factor TFIIB and its antagonist negative cofactor 2 (NC2) are hallmarks of RNA polymerase II (RNAPII) transcription. Both factors bind TATA box-binding protein (TBP) at promoters in a mutually exclusive manner. Dissociation of NC2 is thought to be followed by TFIIB association and subsequent preinitiation complex formation. TFIIB dissociates upon RNAPII promoter clearance, thereby providing a specific measure for steady-state preinitiation complex levels. As yet, genome-scale promoter mapping of human TFIIB has not been reported. It thus remains elusive how human core promoters contribute to preinitiation complex formation in vivo. RESULTS: We compare target genes of TFIIB and NC2 in human B cells and analyze associated core promoter architectures. TFIIB occupancy is positively correlated with gene expression, with the vast majority of promoters being GC-rich and lacking defined core promoter elements. TATA elements, but not the previously in vitro defined TFIIB recognition elements, are enriched in some 4 to 5% of the genes. NC2 binds to a highly related target gene set. Nonetheless, subpopulations show strong variations in factor ratios: whereas high TFIIB/NC2 ratios select for promoters with focused start sites and conserved core elements, high NC2/TFIIB ratios correlate to multiple start-site promoters lacking defined core elements. CONCLUSIONS: TFIIB and NC2 are global players that occupy active genes. Preinitiation complex formation is independent of core elements at the majority of genes. TATA and TATA-like elements dictate TFIIB occupancy at a subset of genes. Biochemical data support a model in which preinitiation complex but not TBP-NC2 complex formation is regulated

    Predicting stimulation-dependent enhancer-promoter interactions from ChIP-Seq time course data

    Get PDF
    We have developed a machine learning approach to predict stimulation-dependent enhancer-promoter interactions using evidence from changes in genomic protein occupancy over time. The occupancy of estrogen receptor alpha (ER), RNA poly- merase (Pol II) and histone marks H2AZ and H3K4me3 were measured over time using ChIP-Seq experiments in MCF7 cells stimulated with estrogen. A Bayesian classifier was developed which uses the correlation of temporal binding patterns at enhancers and promoters and genomic proximity as features to predict interactions. This method was trained using experimentally determined interactions from the same system and was shown to achieve much higher precision than predictions based on the genomic proximity of nearest ER binding. We use the method to identify a genome-wide confident set of ER target genes and their regulatory enhancers genome- wide. Validation with publicly available GRO-Seq data demonstrates that our predicted targets are much more likely to show early nascent transcription than predictions based on genomic ER binding proximity alone.Peer reviewe

    Genome-wide modeling of transcription kinetics reveals patterns of RNA production delays

    Get PDF
    Genes with similar transcriptional activation kinetics can display very different temporal mRNA profiles because of differences in transcription time, degradation rate, and RNA-processing kinetics. Recent studies have shown that a splicing-associated RNA production delay can be significant. To investigate this issue more generally, it is useful to develop methods applicable to genome-wide datasets. We introduce a joint model of transcriptional activation and mRNA accumulation that can be used for inference of transcription rate, RNA production delay, and degradation rate given data from high-throughput sequencing time course experiments. We combine a mechanistic differential equation model with a nonparametric statistical modeling approach allowing us to capture a broad range of activation kinetics, and we use Bayesian parameter estimation to quantify the uncertainty in estimates of the kinetic parameters. We apply the model to data from estrogen receptor alpha activation in the MCF-7 breast cancer cell line. We use RNA polymerase II ChIP-Seq time course data to characterize transcriptional activation and mRNA-Seq time course data to quantify mature transcripts. We find that 11% of genes with a good signal in the data display a delay of more than 20 min between completing transcription and mature mRNA production. The genes displaying these long delays are significantly more likely to be short. We also find a statistical association between high delay and late intron retention in pre-mRNA data, indicating significant splicing-associated production delays in many genes.Peer reviewe

    Inference of RNA Polymerase II Transcription Dynamics from Chromatin Immunoprecipitation Time Course Data

    Get PDF
    Gene transcription mediated by RNA polymerase II (pol-II) is a key step in gene expression. The dynamics of pol-II moving along the transcribed region influence the rate and timing of gene expression. In this work, we present a probabilistic model of transcription dynamics which is fitted to pol-II occupancy time course data measured using ChIP-Seq. The model can be used to estimate transcription speed and to infer the temporal pol-II activity profile at the gene promoter. Model parameters are estimated using either maximum likelihood estimation or via Bayesian inference using Markov chain Monte Carlo sampling. The Bayesian approach provides confidence intervals for parameter estimates and allows the use of priors that capture domain knowledge, e.g. the expected range of transcription speeds, based on previous experiments. The model describes the movement of pol-II down the gene body and can be used to identify the time of induction for transcriptionally engaged genes. By clustering the inferred promoter activity time profiles, we are able to determine which genes respond quickly to stimuli and group genes that share activity profiles and may therefore be co-regulated. We apply our methodology to biological data obtained using ChIP-seq to measure pol-II occupancy genome-wide when MCF-7 human breast cancer cells are treated with estradiol (E2). The transcription speeds we obtain agree with those obtained previously for smaller numbers of genes with the advantage that our approach can be applied genome-wide. We validate the biological significance of the pol-II promoter activity clusters by investigating cluster-specific transcription factor binding patterns and determining canonical pathway enrichment. We find that rapidly induced genes are enriched for both estrogen receptor alpha (ER) and FOXA1 binding in their proximal promoter regions.Peer reviewe

    Genome-Wide in Silico Mapping of Scaffold/Matrix Attachment Regions in Arabidopsis Suggests Correlation of Intragenic Scaffold/Matrix Attachment Regions with Gene Expression

    No full text
    We carried out a genome-wide prediction of scaffold/matrix attachment regions (S/MARs) in Arabidopsis. Results indicate no uneven distribution on the chromosomal level but a clear underrepresentation of S/MARs inside genes. In cases where S/MARs were predicted within genes, these intragenic S/MARs were preferentially located within the 5′-half, most prominently within introns 1 and 2. Using Arabidopsis whole-genome expression data generated by the massively parallel signature sequencing methodology, we found a negative correlation between S/MAR-containing genes and transcriptional abundance. Expressed sequence tag data correlated the same way with S/MAR-containing genes. Thus, intragenic S/MARs show a negative correlation with transcription level. For various genes it has been shown experimentally that S/MARs can function as transcriptional regulators and that they have an implication in stabilizing expression levels within transgenic plants. On the basis of a genome-wide in silico S/MAR analysis, we found a significant correlation between the presence of intragenic S/MARs and transcriptional down-regulation

    ERCB-Nephromine-EuRenOmics Database

    No full text
    <p>The main goal of WP7 is to develop a multiscalar view of rare renal diseases by integrating diverse large-scale data sources. Data generated in WP 2-6 as well as data from cooperating consortia and publically available datasets will be implemented in the EURenOmics platform. The European Renal cDNA Bank (ERCB) hosts over 2600 renal biopsies with anonymized clinical parameters. In this consortium more than 550 genome-wide Affymetrix-based datasets from glomerular and tubulointerstitial specimen from 289 patients have been generated. Out of these, a dataset comprising of 369 samples (183 glomerular, 186 tubulointerstitial) and their basic clinical data has been selected to be integrated in the EURenOmics database. These include data from patients with minimal change disease (n=24), focal segmental glomerulosclerosis (n=47), membranous nephropathy (n=39), but also data from patients with diabetic nephropathy (n=11), hypertensive nephropathy (n=35), IgA nephropathy (n=52) and lupus nephritis (n=64). Pretransplant biopsies from living donors were used as controls (n=84). Another resource for transcriptomic data is Nephromine (http://nephromine.org). It is a web-based kidney specific systems biology search engine that provides data from 20 publically available gene expression datasets from 1757 samples (murine and human), incorporates clinical data and allows various analyses (differential gene expression, coexpression, outlier analysis and concept association). A basic database concept has been defined and a web-based functional data repository allowing a smooth up-and download as well as sharing data amongst partners has been generated. Uploaded data will be used to establish a multilevel analysis domain (EuRenOmics Database), which allows the integration of data from different –omics platforms as well as the implementation of clinical information. The interactive platform provides standardized and harmonized datasets and uses existing architectures from Genomatix’s GePS and ElDorado databases. This search engine will provide a central bioinformatics platform which allows addressing mechanistic, diagnostic and therapeutic challenges at the systems biology level.</p
    corecore