21 research outputs found
A Novel Analytical Strategy to Identify Fusion Transcripts between Repetitive Elements and Protein Coding-Exons Using RNA-Seq
<div><p>Repetitive elements (REs) comprise 40–60% of the mammalian genome and have been shown to epigenetically influence the expression of genes through the formation of fusion transcript (FTs). We previously showed that an intracisternal A particle forms an FT with the <i>agouti</i> gene in mice, causing obesity/type 2 diabetes. To determine the frequency of FTs genome-wide, we developed a TopHat-Fusion-based analytical pipeline to identify FTs with high specificity. We applied it to an RNA-seq dataset from the nucleus accumbens (NAc) of mice repeatedly exposed to cocaine. Cocaine was previously shown to increase the expression of certain REs in this brain region. Using this pipeline that can be applied to single- or paired-end reads, we identified 438 genes expressing 813 different FTs in the NAc. Although all types of studied repeats were present in FTs, simple sequence repeats were underrepresented. Most importantly, reverse-transcription and quantitative PCR validated the expression of selected FTs in an independent cohort of animals, which also revealed that some FTs are the prominent isoforms expressed in the NAc by some genes. In other RNA-seq datasets, developmental expression as well as tissue specificity of some FTs differed from their corresponding non-fusion counterparts. Finally, <i>in silico</i> analysis predicted changes in the structure of proteins encoded by some FTs, potentially resulting in gain or loss of function. Collectively, these results indicate the robustness of our pipeline in detecting these new isoforms of genes, which we believe provides a valuable tool to aid in better understanding the broad role of REs in mammalian cellular biology.</p></div
Fusion gene count and its fold expression relative to nearest control locus.
<p>The genes involved in the fusions are shown; for <i>Klc1</i> and <i>Arhgef10</i> the different fusion events are shown independently. Average read counts identified by RNA-seq in each of the 3 independent replicates from either saline or cocaine-treated animals is presented. The expression level of the fusion relative to the nearest control locus (same exon) for saline- and cocaine-exposed animals, respectively, was estimated by qRT-PCR using primers that amplified the repeat-exon junction or the junction between the same exon with the nearest canonical adjacent locus (exon). For more details see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0159028#sec002" target="_blank">Methods</a>. Average fold-change between saline- and cocaine- treated samples, and total standard error are shown; NA = not analyzed. In italic and bold are the 3 genes that are expressed at similar or higher levels than the non-fusion counterpart. Last column represents the results as the ratio of non-fusion over fusion gene expression; i.e. for every 1 transcript of non-fusion <i>Atp2b1</i> there are 40 transcripts expressing the fusion isoform.</p
Regulation of fusion transcript expression differs from the non-fusion counterpart during development.
<p>Read counts were compared between FTs and non-fusion isoforms of the genes expressing FTs studied. Black bars represent fusion version of a transcript while white bars indicate the non-fusion counterpart. (A) Counts of the only 3 genes found to be expressed in the Macfarlan (2012) data set. Bars indicate the average read counts ± SD from 3 independent samples for oocyte or 2-cell (2C) stage, respectively. (B-D) Pattern of expression of fusion or non-fusion isoforms from 3 genes as a function of neocortex development; data for the other 5 genes can be found in Suppl. <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0159028#pone.0159028.g004" target="_blank">Fig 4</a>. Bars represent average read counts ± SE from 2 samples (1 data set from males and other data set from female mice); thus statistical significance could not be tested. Data from 3 different layers were analyzed: IgL (infragranular layer), L4 (granular layer) and SgL (supragranular layer). P4–14 represent post-natal days 4 to 14.</p
FTs are expressed in the NAc of saline- or cocaine-treated mice.
<p>(A) Heatmap represents the read counts of each fusion event identified in either the saline- or cocaine-treated samples; non-supervised hierarchical clustering was applied separating cocaine- from saline-exposed samples. Red indicates higher number of reads while black indicates lower number of reads. The table on the left shows examples of reads counts, which must span the exact fusion junction, supporting the fusions. S1–S3 and S4–S6 are the 3 independent samples exposed to cocaine or saline, respectively. Pie chart indicates the frequency of fusion events with different classes of repeat elements.</p
Summary of hard clipping analysis for broad estimation of transcriptome-wide fusion gene candidates.
<p>The sequential steps taken and number of reads complying with each filtering step, per sample, are presented. S1–3 are the independent cocaine-exposed samples, S4–6 represent saline-treated animals.</p
qRT-PCR validated the differential expression of some FTs.
<p>Relative fold changes to GAPDH of FT levels in the NAc 24 hours after seven daily cocaine i.p. injections, with saline injections as control. Data are presented as mean ± SEM. N = 10–12 per group. *p< 0.05, unpaired Student’s t test.</p
Some genes expressing FTs are expressed in several tissues while others are expressed in a tissue-specific manner.
<p>Read counts were compared between FTs and non-fusion isoforms of genes in different areas of the brain (left side of graph), including the amygdala (AMY), hippocampus (HIP), NAc, prefrontal cortex (PFC) and the ventral tegmental area (VTA) as well as in different tissues (right side of graph). Black bars represent average counts per million aligned reads ± SD from 4–6 independent RNA-seq samples of the fusion version of a transcript while white bars indicate the average ± SD for the non-fusion counterpart.</p
Downstream Antisense Transcription Predicts Genomic Features That Define the Specific Chromatin Environment at Mammalian Promoters
<div><p>Antisense transcription is a prevalent feature at mammalian promoters. Previous studies have primarily focused on antisense transcription initiating upstream of genes. Here, we characterize promoter-proximal antisense transcription downstream of gene transcription starts sites in human breast cancer cells, investigating the genomic context of downstream antisense transcription. We find extensive correlations between antisense transcription and features associated with the chromatin environment at gene promoters. Antisense transcription downstream of promoters is widespread, with antisense transcription initiation observed within 2 kb of 28% of gene transcription start sites. Antisense transcription initiates between nucleosomes regularly positioned downstream of these promoters. The nucleosomes between gene and downstream antisense transcription start sites carry histone modifications associated with active promoters, such as H3K4me3 and H3K27ac. This region is bound by chromatin remodeling and histone modifying complexes including SWI/SNF subunits and HDACs, suggesting that antisense transcription or resulting RNA transcripts contribute to the creation and maintenance of a promoter-associated chromatin environment. Downstream antisense transcription overlays additional regulatory features, such as transcription factor binding, DNA accessibility, and the downstream edge of promoter-associated CpG islands. These features suggest an important role for antisense transcription in the regulation of gene expression and the maintenance of a promoter-associated chromatin environment.</p></div
Sequence content is consistent across identified transcription start sites (TSSs).
<p>(A) Sequence composition at identified transcription start sites. Nucleotide composition at observed gene TSSs, upstream antisense TSSs (uaTSSs), and downstream antisense TSSs (daTSSs) is shown over +/- 1kb and +/- 50bp windows. Logo plots of sequence within a +/- 5 nt window about TSS positions show a pyrimidine-purine dinucleotide reminiscent of Initiator-binding motifs. (B) Heatmaps of CpG island occurrences about observed gene TSSs sorted by TSS-uaTSS or TSS-daTSS distances. (C) Plots of average occurrences of Pol II-associated sequence motifs in a +/- 1kb window about TSSs. Motif position weight matrices were taken from the Pol II subset of the JASPAR database [<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1006224#pgen.1006224.ref018" target="_blank">18</a>]. (D) Motifs identified by <i>de novo</i> discovery near each class of TSS. Motifs were found at (1) an AT-rich region found upstream of TSSs and (2) a small window centered on TSSs containing a distinct pyrimidine-purine dinucleotide. <i>De novo</i> motif discovery results in sequences resembling TATA and Inr-binding motifs. Distribution of identified motifs across TSS classes is shown within an inset table. (E) PhyloP conservation score across observed TSSs [<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1006224#pgen.1006224.ref019" target="_blank">19</a>]. Positive values correspond to enhanced sequence conservation. (Top) Heatmaps of PhyloP scores centered on gene TSS positions and sorted by TSS-uaTSS or TSS-daTSS distance. (Bottom) Average PhyloP conversation scores over all gene TSS, uaTSS, and daTSS calls observing +/-1 kb windows.</p
Binding of trans-regulatory factors is coincident with an open DNA region at downstream antisense TSSs (daTSSs).
<p>(A) Heatmaps and plots of FAIRE-seq read density over identified TSS positions. Heatmaps of FAIRE-seq density are centered on gene TSSs and sorted by gene TSS-uaTSS or gene TSS-daTSS distances. Areas with enriched signal correspond to open DNA regions. (B) Occurrences of vertebrate sequence motifs over identified TSS positions. (Left) Heatmaps of motif occurrences centered on gene TSSs and sorted by TSS-uaTSS or TSS-daTSS distance. (Right) Plots of the average number of motif occurrences over each class of TSS. Motif position weight matrices were taken from the JASPAR database [<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1006224#pgen.1006224.ref018" target="_blank">18</a>]. (C) Heatmaps of ChIP-seq read density for a variety of trans-regulatory factors across heterologous cells lines. Transcription factors and chromatin remodelers display distinct modes of binding relative to gene TSS and daTSS positions. ChIP-seq experiments were performed in the following cell lines: GM12878: TBP and NFIC; K562: c-Fos, c-Jun, CHD1-A, and Sap30; MCF7: GATA3 and p300; A549: SP1 and CTCF; HeLa: BRG1, INI1, BAF155, and BAF170. Displayed p-values were calculated by comparing signal density at daTSSs with equivalent positions at genes without daTSSs by Wilcoxon test (p-values are compiled in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1006224#pgen.1006224.s002" target="_blank">S2 Table</a>).</p