55 research outputs found

    Computational Prediction of Polycomb-Associated Long Non-Coding RNAs

    Get PDF
    <div><p>Among thousands of long non-coding RNAs (lncRNAs) only a small subset is functionally characterized and the functional annotation of lncRNAs on the genomic scale remains inadequate. In this study we computationally characterized two functionally different parts of human lncRNAs transcriptome based on their ability to bind the polycomb repressive complex, PRC2. This classification is enabled by the fact that while all lncRNAs constitute a diverse set of sequences, the classes of PRC2-binding and PRC2 non-binding lncRNAs possess characteristic combinations of sequence-structure patterns and, therefore, can be separated within the feature space. Based on the specific combination of features, we built several machine-learning classifiers and identified the SVM-based classifier as the best performing. We further showed that the SVM-based classifier is able to generalize on the independent data sets. We observed that this classifier, trained on the human lncRNAs, can predict up to 59.4% of PRC2-binding lncRNAs in mice. This suggests that, despite the low degree of sequence conservation, many lncRNAs play functionally conserved biological roles.</p> </div

    Visualization of the classification performance for four classifiers and the set of features, selected at the 0.05 significance level.

    No full text
    <p>The observations along X axis are reordered according to their true class labels. For each observation red and green dots represent the estimated probabilities to belong to class 0 and 1 respectively. Dotted line separates observations from class 0 and class 1. As it is evident from the plot, the probability of observation to belong to a specific class is in agreement with its class label.</p

    Classifiers performances (0.01 significance level).

    No full text
    <p>Classifiers performances (0.01 significance level).</p

    Consensus motifs enriched in PRC2-binding and PRC2 non-binding lncRNAs.

    No full text
    a)<p>PRC<sup>+</sup>: PRC2-binding lncRNAs.</p>b)<p>PRC<sup>βˆ’</sup>: PRC2 non-binding lncRNAs.</p>c)<p>IUPAC nucleotide code: <a href="http://www.bioinformatics.org/sms/iupac.html" target="_blank">http://www.bioinformatics.org/sms/iupac.html</a>.</p

    Figure 2

    No full text
    <p> <b>ROC curves for four different classifiers and the set of features selected at the 0.05 significance level.</b></p

    The Vast, Conserved Mammalian lincRNome

    Get PDF
    <div><p>We compare the sets of experimentally validated long intergenic non-coding (linc)RNAs from human and mouse and apply a maximum likelihood approach to estimate the total number of lincRNA genes as well as the size of the conserved part of the lincRNome. Under the assumption that the sets of experimentally validated lincRNAs are random samples of the lincRNomes of the corresponding species, we estimate the total lincRNome size at approximately 40,000 to 50,000 species, at least twice the number of protein-coding genes. We further estimate that the fraction of the human and mouse euchromatic genomes encoding lincRNAs is more than twofold greater than the fraction of protein-coding sequences. Although the sequences of most lincRNAs are much less strongly conserved than protein sequences, the extent of orthology between the lincRNomes is unexpectedly high, with 60 to 70% of the lincRNA genes shared between human and mouse. The orthologous mammalian lincRNAs can be predicted to perform equivalent functions; accordingly, it appears likely that thousands of evolutionarily conserved functional roles of lincRNAs remain to be characterized.</p> </div

    The fractions of the human and mouse genomes allotted to protein-coding and lincRNA-coding sequences<sup>a</sup>.

    No full text
    a<p>The data on protein-coding genes and the total size of the euchromatic genomes are from <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002917#pcbi.1002917-Church1" target="_blank">[31]</a>, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002917#pcbi.1002917-International1" target="_blank">[63]</a>.</p>b<p>The total length of the human lincRNome is likely to be an underestimate caused by the use of RNAseq data to calculate the lengths of lincRNAs in the human validated set <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002917#pcbi.1002917-Cabili1" target="_blank">[27]</a>.</p

    Estimates of the numbers of all and orthologous lincRNAs with varying expression thresholds<sup>a</sup>.

    No full text
    a<p>Indel threshold: 95%, ORF threshold: 120 nt (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002917#s4" target="_blank">Methods</a>). Expression thresholds were applied to lincRNA genes (Lh, Lm, and Kb) and putative orthologous lincRNA genes (Kh and Km).</p><p>% stands for conservation percentage as in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002917#pcbi-1002917-t001" target="_blank">Table 1</a>.</p

    RPKM-based estimates of the numbers of all and orthologous lincRNAs.

    No full text
    <p>Two expression and four indel thresholds were applied to putative orthologous lincRNA genes (Kh and Km).</p
    • …
    corecore