14 research outputs found

    Transcriptional Dynamics Reveal Critical Roles for Non-coding RNAs in the Immediate-Early Response

    Get PDF
    <div><p>The immediate-early response mediates cell fate in response to a variety of extracellular stimuli and is dysregulated in many cancers. However, the specificity of the response across stimuli and cell types, and the roles of non-coding RNAs are not well understood. Using a large collection of densely-sampled time series expression data we have examined the induction of the immediate-early response in unparalleled detail, across cell types and stimuli. We exploit cap analysis of gene expression (CAGE) time series datasets to directly measure promoter activities over time. Using a novel analysis method for time series data we identify transcripts with expression patterns that closely resemble the dynamics of known immediate-early genes (IEGs) and this enables a comprehensive comparative study of these genes and their chromatin state. Surprisingly, these data suggest that the earliest transcriptional responses often involve promoters generating non-coding RNAs, many of which are produced in advance of canonical protein-coding IEGs. IEGs are known to be capable of induction without de novo protein synthesis. Consistent with this, we find that the response of both protein-coding and non-coding RNA IEGs can be explained by their transcriptionally poised, permissive chromatin state prior to stimulation. We also explore the function of non-coding RNAs in the attenuation of the immediate early response in a small RNA sequencing dataset matched to the CAGE data: We identify a novel set of microRNAs responsible for the attenuation of the IEG response in an estrogen receptor positive cancer cell line. Our computational statistical method is well suited to meta-analyses as there is no requirement for transcripts to pass thresholds for significant differential expression between time points, and it is agnostic to the number of time points per dataset.</p></div

    Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

    Get PDF
    Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism

    Mature microRNA regulation and host gene activation.

    No full text
    <p>(A) Expression of mature hsa-mir-6163 and transcriptional activation of two of its target IEGs FOSB and EGR3 in MCF7 cells in response to HRG. Data values are plotted as circles (median value is filled). (B) Median CAGE expression (black circles) of precursor miRNA and median mature miRNA expression (red triangles) for hsa-mir-320a, host lncRNA MIR155HG and mature hsa-mir-155, and for hsa-mir-21 in MCF7-HRG (three replicates, lines are a spline fitted to the data). For hsa-mir-320a the increase in CAGE expression is significant when comparing 0min and 210min and the decrease in mature transcript levels is significant when comparing 0min and 240min (p ≤ 0.05 by t test). For hsa-mir-155 the increase in CAGE expression of MIR155HG is significant when comparing 0min and 180min, and the increase in mature transcript levels is significant when comparing 0min and 240min (p ≤ 0.05 by t test). For hsa-mir-21 the increases in CAGE expression and in mature transcript levels are significant when comparing 0min and 80min (p ≤ 0.05 by t test).</p

    Non-coding RNA gene activation.

    No full text
    <p>(A) Histograms of <i>t</i><sub><i>s</i></sub> for lncRNA, snoRNA, snRNA and miRNA precursors show these genes are activated rapidly. (B) Density plot of early peak gene length against <i>t</i><sub><i>s</i></sub> for all RNA biotypes (grey symbols), lncRNA (red symbols) and miRNA precursors (blue symbols). LncRNA and miRNA form distinct clusters of RNAs activated with a wide range of kinetics.</p

    Timing of early peak CAGE clusters.

    No full text
    <p>(A) Bar charts showing the percentage of early peak clusters associated with IEGs (<i>t</i><sub><i>s</i></sub> binned in 30 min intervals), and (B) and those associated with nucleotide binding genes. The horizontal line indicates the average percentage. (C) The timing of known IEGs and transcription factors is shown for IEGs (red) and TFs (purple) assigned to the early peak signature in each MCF7 experiment. Symbols indicate the <i>t</i><sub><i>s</i></sub> (plotted on the x axis) and are labelled with the gene name associated with the CAGE cluster (symbols are positioned on the y axis for legibility only).</p

    Pathway analysis.

    No full text
    <p>P values for the over-representation of CAGE clusters in 73 Panther gene sets containing at least 20 genes. P values are calculated by hypergeometric test on the counts of clusters from all four data sets combined. Only pathways with p values ≤ 0.05 are listed (those with a FDR significant at 0.1 are indicated by *).</p><p>Pathway analysis.</p

    Density plots of gene length against <i>t</i><sub><i>s</i></sub> for early peak clusters.

    No full text
    <p>Grey contour lines indicate the projected <i>t</i><sub><i>s</i></sub> for the completion of transcription accounting for gene length (a transcription rate of 60 bases/s is assumed [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004217#pcbi.1004217.ref032" target="_blank">32</a>]). (A) Early peak known IEGs (red symbols represent the underlying IEG CAGE cluster data). (B) Early peak known nucleotide binding genes and underlying data (blue symbols). (C) Travelling ratios for known IEGs and for early peak genes in MCF7 cells demonstrate promoter proximal pausing as the travelling ratio is shifts towards higher values. The intersection of IEGs and early peak genes (right-most plot) shows that the strong pausing effect seen for IEGs holds for those assigned the early peak signature.</p

    Kinetic signatures for IEGs.

    No full text
    <p>(A) Kinetic signatures are defined as piece-wise exponential (peak and dip), simple exponential (decay) or linear functions.(B) CAGE clusters associated with known IEGs show significant expression at time 0 (left; median 14.7 TPM). The maximum log2 fold change at any point in the time course over expression at time 0 is typically less than 2 (right; median 1.64). Histograms show data from all four data sets for 194l known IEGs. (C) Kinetic signatures fitted to the CAGE time course of EGR1 in EGF treated MCF7 cells yield values for the fit (log Z) and estimates for parameter moments. Plots show the kinetic signature function using computed parameter means (blue) and confidence intervals (red) for peak (left) and linear (right) kinetic signatures. In this case, log Z for the peak signature (-27.2) is greater than that for the linear model (-35), indicating a significantly better explanation of the data. Data values are plotted as circles (median value is filled). (D) CAGE time course data and best-fitting kinetic signature for IEGs JUN, FOS, EGR1 and DUSP1 (colours as in (C)). The vertical green lines indicate the mean switch time <i>t</i><sub><i>S</i></sub> and one standard deviation above and below.</p
    corecore