16 research outputs found

    Characterization of Human Pseudogene-Derived Non-Coding RNAs for Functional Potential

    No full text
    <div><p>Thousands of pseudogenes exist in the human genome and many are transcribed, but their functional potential remains elusive and understudied. To explore these issues systematically, we first developed a computational pipeline to identify transcribed pseudogenes from RNA-Seq data. Applying the pipeline to datasets from 16 distinct normal human tissues identified ∼3,000 pseudogenes that could produce non-coding RNAs in a manner of low abundance but high tissue specificity under normal physiological conditions. Cross-tissue comparison revealed that the transcriptional profiles of pseudogenes and their parent genes showed mostly positive correlations, suggesting that pseudogene transcription could have a positive effect on the expression of their parent genes, perhaps by functioning as competing endogenous RNAs (ceRNAs), as previously suggested and demonstrated with the <i>PTEN</i> pseudogene, <i>PTENP1</i>. Our analysis of the ENCODE project data also found many transcriptionally active pseudogenes in the GM12878 and K562 cell lines; moreover, it showed that many human pseudogenes produced small RNAs (sRNAs) and some pseudogene-derived sRNAs, especially those from antisense strands, exhibited evidence of interfering with gene expression. Further integrated analysis of transcriptomics and epigenomics data, however, demonstrated that trimethylation of histone 3 at lysine 9 (H3K9me3), a posttranslational modification typically associated with gene repression and heterochromatin, was enriched at many transcribed pseudogenes in a transcription-level dependent manner in the two cell lines. The H3K9me3 enrichment was more prominent in pseudogenes that produced sRNAs at pseudogene loci and their adjacent regions, an observation further supported by the co-enrichment of SETDB1 (a H3K9 methyltransferase), suggesting that pseudogene sRNAs may have a role in regional chromatin repression. Taken together, our comprehensive and systematic characterization of pseudogene transcription uncovers a complex picture of how pseudogene ncRNAs could influence gene and pseudogene expression, at both epigenetic and post-transcriptional levels.</p></div

    Enrichment of H3K9me3 modification at transcribed pseudogene loci.

    No full text
    <p>A) Heatmap of H3K36me3 near the transcription start sites (TSS) and transcription end sites (TES) of transcribed (bottom) and non-transcribed pseudogenes (top). The color scheme is based on column-based normalization data in GM12878, whereas each row is a pseudogene. B) Transcription level dependent enrichment of H3K9me3 at transcribed pseudogenes. Y-axis shows the average number of H3K9me3 ChIP-Seq reads per 500 bp. C) & D) The level of H3K9me3 (red) but not H3K27me3 (green) was significantly higher at group II pseudogenes (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0093972#pone-0093972-g005" target="_blank">Fig. 5</a>) than at group I pseudogenes or at pseudogenes loci producing no sRNAs (“C”, controls). The H3K9me3 level at a randomly selected set of LINE (blue) was also plotted as positive controls. Y-axis plots ChIP-Seq reads at pseudogene bodies, normalized to per 500-bp sequences. E) The densities of H3K36me3, H3K27me3, and H3K9me3 ChIP-Seq reads and sRNA-Seq reads at a region with multiple pseudogenes derived from a gene encoding NADH dehydrogenase. F–H) The average ChIP-Seq profiles, anchored on pseudogene centers, of H3K9me3 in GM12878 (F) and in K562 (G) and of SETDB1 in K562 (H) for the three groups of pseudogenes. Y-axes show the average numbers of ChIP-Seq reads per 100 bp.</p

    Top pseudogene candidates of three different types of predicted functional potentials (ND, not determined). The full lists can be found in Table S1.

    No full text
    <p>Top pseudogene candidates of three different types of predicted functional potentials (ND, not determined). The full lists can be found in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0093972#pone.0093972.s008" target="_blank">Table S1</a>.</p

    High tissue specificity of pseudogene transcription.

    No full text
    <p>A) Heatmap for the transcription levels of 982 highly transcribed pseudogenes (maximal FPKM >10). B) Violin plots showing tissue-specificity JS scores of lincRNAs, transcribed pseudogenes, their parents, and the coding genes without pseudogenes. C) Comparison of JS scores at different transcription levels. The white dots mark median and the thick boxes mark the first and third quartile values.</p

    Pseudogene-derived sRNAs and their relationship to parental gene repression.

    No full text
    <p>A) Processed pseudogenes had higher sRNA read densities than any other annotated genomic elements and randomly chosen genomic regions in both GM12878 and K562 cell lines. B) Pseudogenes with mapped sRNA reads (≥5 reads per kb) were separated into two groups based on the abundance of sRNA reads in the adjacent non-pseudogene regions (±1 kb, orange). Group I was considered to produce sRNA interactively with their parents while group II produced sRNA independently. Venn diagrams show the data comparison between GM12878 (red) and K562 (green). C) The parental genes of group I pseudogenes showed significantly lower expression than either those of the pseudogenes without sRNA (control) or those of the group II pseudogenes, in both GM12878 (red) and K562 (green). The parents of antisense transcribed pseudogenes (>5 sRNA/kb) exhibited even lower expression. The same trends held when the analysis was carried out for pseudogenes with >10 sRNA/kb. Parents not expressed in the 16 normal tissues (i.e., FPKM = 0) were not included in these plots.</p

    Selection constraints on transcribed pseudogenes.

    No full text
    <p>Comparison of nucleotide diversities in human population (A) and cross-species conservations (B) between non-transcribed (‘n’) and transcribed pseudogenes (‘y’). AluY, a young repeats that emerged recently in primates, was used as control. For duplicated pseudogenes, the median diversities for transcribed and non-transcribed are 0. 00051 and 0.00054 (p<0.02, Wilcoxon test), the values for processed pseudogenes are 0.00055 and 0.00064 (p<3e-06, Wilcoxon test).</p

    Transcriptional correlations (ρ<sub>pg:g</sub>) between pseudogenes and their parents.

    No full text
    <p>A) A heatmap for distribution of ρ<sub>pg:g</sub>, including data from separation of processed and duplicated pseudogenes into two groups based on the presence of a coding gene within 20 kb. The coefficients between transcribed pseudogenes and randomly chosen coding genes (top) were used as a control for p-value estimation. Colors represent relative numbers of pseudogenes in each ρ<sub>pg:g</sub> range (in Z-score transformation). B) Pseudogenes transcribed in the sense direction (S) exhibited higher ρ<sub>pg:g</sub> than those in the antisense (A). C) The transcriptional correlation between pseudogenes and their parents (ρ<sub>pg:g</sub>) is inversely correlated to the transcriptional correlation between miRNAs and their putative targets (ρ<sub>miRNA:g</sub>). Genes were binned on their ρ<sub>miRNA:g</sub> values (x-axis) and then the mean and standard deviation of ρ<sub>pg:g</sub> (y-axis) for each group of genes was plotted. D) Expression of parental genes targeted by miRNAs was less affected by miRNA KD than the targeting genes without pseudogenes. Only genes in response to KD (up >1.3 fold) were analyzed here. Y-axis shows the fold change of KD over control. The miRNA targets were experimentally determined by the CLASH analysis <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0093972#pone.0093972-Helwak1" target="_blank">[49]</a>. The middle line in the boxplots mark median and the box lines mark the first and third quartile values (same for boxplots below).</p

    Histone modification profiles at REST peaks.

    No full text
    <p>Profiles of (A) H3K4me3 and (B) H3K27me3 ChIP-seq signal at REST peaks in GM12878 cells, H1 ESCs, neurons, A549, HeLa S3, Hep G2, and K562 cells. Y-axis shows the read density from Zhu <i>et. al. </i><a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003671#pcbi.1003671-Zhu1" target="_blank">[68]</a> or ENCODE <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003671#pcbi.1003671-A1" target="_blank">[34]</a> per 150 bp averaged over REST-bound peaks in each cell type from −6 kb to 6 kb of the peak summits. Data were normalized to a read depth of 5 million mapped reads.</p

    Colocalization of REST and its cofactors and their relationship with gene expression.

    No full text
    <p>(A) Venn diagram of colocalization of SIN3, COREST, and EZH2 at REST peaks. The square constitutes all REST peaks. (B) Context enrichment of REST peaks bound by different cofactors; fold enrichment is compared to all REST peaks. (C) Expression of REST targets with different cofactors, plotted as log2(FPKM+0.1); asterisks mark significant differences. Data shown here are for GM12878 and data for Hep G2 are in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003671#pcbi.1003671.s006" target="_blank">Figure S6</a>.</p
    corecore