69 research outputs found

    Characterization of Human Pseudogene-Derived Non-Coding RNAs for Functional Potential

    No full text
    <div><p>Thousands of pseudogenes exist in the human genome and many are transcribed, but their functional potential remains elusive and understudied. To explore these issues systematically, we first developed a computational pipeline to identify transcribed pseudogenes from RNA-Seq data. Applying the pipeline to datasets from 16 distinct normal human tissues identified ∼3,000 pseudogenes that could produce non-coding RNAs in a manner of low abundance but high tissue specificity under normal physiological conditions. Cross-tissue comparison revealed that the transcriptional profiles of pseudogenes and their parent genes showed mostly positive correlations, suggesting that pseudogene transcription could have a positive effect on the expression of their parent genes, perhaps by functioning as competing endogenous RNAs (ceRNAs), as previously suggested and demonstrated with the <i>PTEN</i> pseudogene, <i>PTENP1</i>. Our analysis of the ENCODE project data also found many transcriptionally active pseudogenes in the GM12878 and K562 cell lines; moreover, it showed that many human pseudogenes produced small RNAs (sRNAs) and some pseudogene-derived sRNAs, especially those from antisense strands, exhibited evidence of interfering with gene expression. Further integrated analysis of transcriptomics and epigenomics data, however, demonstrated that trimethylation of histone 3 at lysine 9 (H3K9me3), a posttranslational modification typically associated with gene repression and heterochromatin, was enriched at many transcribed pseudogenes in a transcription-level dependent manner in the two cell lines. The H3K9me3 enrichment was more prominent in pseudogenes that produced sRNAs at pseudogene loci and their adjacent regions, an observation further supported by the co-enrichment of SETDB1 (a H3K9 methyltransferase), suggesting that pseudogene sRNAs may have a role in regional chromatin repression. Taken together, our comprehensive and systematic characterization of pseudogene transcription uncovers a complex picture of how pseudogene ncRNAs could influence gene and pseudogene expression, at both epigenetic and post-transcriptional levels.</p></div

    Additional file 1: Figures S1–S15. of Characteristics of allelic gene expression in human brain cells from single-cell RNA-seq data analysis

    No full text
    Figure S1. SNP calling result using mouse embryonic scRNA-seq data. Figure S2. A cartoon illustrating the steps and criteria in our allelic expression. Figure S3. Numbers of hetSNP called for the six human brains. Figure S4. The effect of cell numbers on hetSNP calling and the genomic distribution of hetSNPs. Figure S5. Boxplots showing the numbers of brain cells expressing reference (R) or alternative (A) alleles (allelic read depth ≥ 2). Figure S6. Boxplots showing the percentages of reference reads (vs total reads) at hetSNP sites in brain cells (read depth for each of the alleles was ≥2 and the sum of read depths was ≥10). Figure S7. Allelic expression of hetSNPs within human imprinted genes in brain cells. Figure S8. Allelic expression of hetSNPs within mouse imprinted genes in embryonic cells. Figure S9. Numbers of hetSNPs sites with different reference allele ratios. Figure S10. Numbers of hetSNPs sites with different reference allele ratios, after scRNA-seq reads from cells of the same type in individual brains were pooled. Figure S11. Statistical summaries of allelic expression at the gene level. Figure S12. FPKM cutoff values for defining the top 30 percentile of genes in each cell. Figure S13. Monoallelic expression in subsampled neurons. Figure S14. Numbers of individual cells in which a MA gene was detected. Figure S15. Comparison of monoallelic expression between neurons and astrocytes in adult37, adult47 and adult50. (PDF 2190 kb

    Additional file 3: Table S2. of Characteristics of allelic gene expression in human brain cells from single-cell RNA-seq data analysis

    No full text
    Gene biased status in each cell of individual brains. The three numbers of SNPs supporting allele bias (MA/BA/Unknown) and the letter indicating gene bias status (M: MA; B: BA; U: Unknown) were separated by slash (/). A dot (.) means data not available. (TXT 5965 kb

    Additional file 2: Tables S1, S4 and S5. of Characteristics of allelic gene expression in human brain cells from single-cell RNA-seq data analysis

    No full text
    Table S1. Cell numbers used for scRNA-seq of the brains. This table is based on the cell classification in the original study (Darmanis et al., 2015). The column of “Experiment_sample_name” lists the sample labels in the original research. Only the first six adult samples were used in our analysis. Table S4. List of disease-related genes showing monoallelic expression in human brains at the cell-type level. Table S5. List of module genes from WGCNA. Gene symbols of three significant modules (salmon2, salmon4 and magenta) were listed. (DOC 68 kb

    Top pseudogene candidates of three different types of predicted functional potentials (ND, not determined). The full lists can be found in Table S1.

    No full text
    <p>Top pseudogene candidates of three different types of predicted functional potentials (ND, not determined). The full lists can be found in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0093972#pone.0093972.s008" target="_blank">Table S1</a>.</p

    High tissue specificity of pseudogene transcription.

    No full text
    <p>A) Heatmap for the transcription levels of 982 highly transcribed pseudogenes (maximal FPKM >10). B) Violin plots showing tissue-specificity JS scores of lincRNAs, transcribed pseudogenes, their parents, and the coding genes without pseudogenes. C) Comparison of JS scores at different transcription levels. The white dots mark median and the thick boxes mark the first and third quartile values.</p

    Selection constraints on transcribed pseudogenes.

    No full text
    <p>Comparison of nucleotide diversities in human population (A) and cross-species conservations (B) between non-transcribed (‘n’) and transcribed pseudogenes (‘y’). AluY, a young repeats that emerged recently in primates, was used as control. For duplicated pseudogenes, the median diversities for transcribed and non-transcribed are 0. 00051 and 0.00054 (p<0.02, Wilcoxon test), the values for processed pseudogenes are 0.00055 and 0.00064 (p<3e-06, Wilcoxon test).</p

    Transcriptional correlations (ρ<sub>pg:g</sub>) between pseudogenes and their parents.

    No full text
    <p>A) A heatmap for distribution of ρ<sub>pg:g</sub>, including data from separation of processed and duplicated pseudogenes into two groups based on the presence of a coding gene within 20 kb. The coefficients between transcribed pseudogenes and randomly chosen coding genes (top) were used as a control for p-value estimation. Colors represent relative numbers of pseudogenes in each ρ<sub>pg:g</sub> range (in Z-score transformation). B) Pseudogenes transcribed in the sense direction (S) exhibited higher ρ<sub>pg:g</sub> than those in the antisense (A). C) The transcriptional correlation between pseudogenes and their parents (ρ<sub>pg:g</sub>) is inversely correlated to the transcriptional correlation between miRNAs and their putative targets (ρ<sub>miRNA:g</sub>). Genes were binned on their ρ<sub>miRNA:g</sub> values (x-axis) and then the mean and standard deviation of ρ<sub>pg:g</sub> (y-axis) for each group of genes was plotted. D) Expression of parental genes targeted by miRNAs was less affected by miRNA KD than the targeting genes without pseudogenes. Only genes in response to KD (up >1.3 fold) were analyzed here. Y-axis shows the fold change of KD over control. The miRNA targets were experimentally determined by the CLASH analysis <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0093972#pone.0093972-Helwak1" target="_blank">[49]</a>. The middle line in the boxplots mark median and the box lines mark the first and third quartile values (same for boxplots below).</p

    Pseudogene-derived sRNAs and their relationship to parental gene repression.

    No full text
    <p>A) Processed pseudogenes had higher sRNA read densities than any other annotated genomic elements and randomly chosen genomic regions in both GM12878 and K562 cell lines. B) Pseudogenes with mapped sRNA reads (≥5 reads per kb) were separated into two groups based on the abundance of sRNA reads in the adjacent non-pseudogene regions (±1 kb, orange). Group I was considered to produce sRNA interactively with their parents while group II produced sRNA independently. Venn diagrams show the data comparison between GM12878 (red) and K562 (green). C) The parental genes of group I pseudogenes showed significantly lower expression than either those of the pseudogenes without sRNA (control) or those of the group II pseudogenes, in both GM12878 (red) and K562 (green). The parents of antisense transcribed pseudogenes (>5 sRNA/kb) exhibited even lower expression. The same trends held when the analysis was carried out for pseudogenes with >10 sRNA/kb. Parents not expressed in the 16 normal tissues (i.e., FPKM = 0) were not included in these plots.</p

    Enrichment of H3K9me3 modification at transcribed pseudogene loci.

    No full text
    <p>A) Heatmap of H3K36me3 near the transcription start sites (TSS) and transcription end sites (TES) of transcribed (bottom) and non-transcribed pseudogenes (top). The color scheme is based on column-based normalization data in GM12878, whereas each row is a pseudogene. B) Transcription level dependent enrichment of H3K9me3 at transcribed pseudogenes. Y-axis shows the average number of H3K9me3 ChIP-Seq reads per 500 bp. C) & D) The level of H3K9me3 (red) but not H3K27me3 (green) was significantly higher at group II pseudogenes (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0093972#pone-0093972-g005" target="_blank">Fig. 5</a>) than at group I pseudogenes or at pseudogenes loci producing no sRNAs (“C”, controls). The H3K9me3 level at a randomly selected set of LINE (blue) was also plotted as positive controls. Y-axis plots ChIP-Seq reads at pseudogene bodies, normalized to per 500-bp sequences. E) The densities of H3K36me3, H3K27me3, and H3K9me3 ChIP-Seq reads and sRNA-Seq reads at a region with multiple pseudogenes derived from a gene encoding NADH dehydrogenase. F–H) The average ChIP-Seq profiles, anchored on pseudogene centers, of H3K9me3 in GM12878 (F) and in K562 (G) and of SETDB1 in K562 (H) for the three groups of pseudogenes. Y-axes show the average numbers of ChIP-Seq reads per 100 bp.</p
    corecore