17 research outputs found

    RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure.</p> <p>Results</p> <p>We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space.</p> <p>Conclusion</p> <p>The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.</p

    RNAslider: a faster engine for consecutive windows folding and its application to the analysis of genomic folding asymmetry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Scanning large genomes with a sliding window in search of locally stable RNA structures is a well motivated problem in bioinformatics. Given a predefined window size L and an RNA sequence S of size N (L < N), the consecutive windows folding problem is to compute the minimal free energy (MFE) for the folding of each of the L-sized substrings of S. The consecutive windows folding problem can be naively solved in O(NL<sup>3</sup>) by applying any of the classical cubic-time RNA folding algorithms to each of the N-L windows of size L. Recently an O(NL<sup>2</sup>) solution for this problem has been described.</p> <p>Results</p> <p>Here, we describe and implement an O(NLψ(L)) engine for the consecutive windows folding problem, where ψ(L) is shown to converge to O(1) under the assumption of a standard probabilistic polymer folding model, yielding an O(L) speedup which is experimentally confirmed. Using this tool, we note an intriguing directionality (5'-3' vs. 3'-5') folding bias, i.e. that the minimal free energy (MFE) of folding is higher in the native direction of the DNA than in the reverse direction of various genomic regions in several organisms including regions of the genomes that do not encode proteins or ncRNA. This bias largely emerges from the genomic dinucleotide bias which affects the MFE, however we see some variations in the folding bias in the different genomic regions when normalized to the dinucleotide bias. We also present results from calculating the MFE landscape of a mouse chromosome 1, characterizing the MFE of the long ncRNA molecules that reside in this chromosome.</p> <p>Conclusion</p> <p>The efficient consecutive windows folding engine described in this paper allows for genome wide scans for ncRNA molecules as well as large-scale statistics. This is implemented here as a software tool, called RNAslider, and applied to the scanning of long chromosomes, leading to the observation of features that are visible only on a large scale.</p

    Psiscan: a computational approach to identify H/ACA-like and AGA-like non-coding RNA in trypanosomatid genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Detection of non coding RNA (ncRNA) molecules is a major bioinformatics challenge. This challenge is particularly difficult when attempting to detect H/ACA molecules which are involved in converting uridine to pseudouridine on rRNA in trypanosomes, because these organisms have unique H/ACA molecules (termed H/ACA-like) that lack several of the features that characterize H/ACA molecules in most other organisms.</p> <p>Results</p> <p>We present here a computational tool called Psiscan, which was designed to detect H/ACA-like molecules in trypanosomes. We started by analyzing known H/ACA-like molecules and characterized their crucial elements both computationally and experimentally.</p> <p>Next, we set up constraints based on this analysis and additional phylogenic and functional data to rapidly scan three trypanosome genomes (<it>T. brucei</it>, <it>T. cruzi </it>and <it>L. major</it>) for sequences that observe these constraints and are conserved among the species. In the next step, we used minimal energy calculation to select the molecules that are predicted to fold into a lowest energy structure that is consistent with the constraints. In the final computational step, we used a Support Vector Machine that was trained on known H/ACA-like molecules as positive examples and on negative examples of molecules that were identified by the computational analyses but were shown experimentally not to be H/ACA-like molecules. The leading candidate molecules predicted by the SVM model were then subjected to experimental validation.</p> <p>Conclusion</p> <p>The experimental validation showed 11 molecules to be expressed (4 out of 25 in the intermediate stage and 7 out of 19 in the final validation after the machine learning stage). Five of these 11 molecules were further shown to be bona fide H/ACA-like molecules. As snoRNA in trypanosomes are organized in clusters, the new H/ACA-like molecules could be used as starting points to manually search for additional molecules in their neighbourhood. All together this study increased our repertoire by fourteen H/ACA-like and six C/D snoRNAs molecules from <it>T. brucei </it>and <it>L. Major</it>. In addition the experimental analysis revealed that six ncRNA molecules that are expressed are not downregulated in CBF5 silenced cells, suggesting that they have structural features of H/ACA-like molecules but do not have their standard function. We termed this novel class of molecules AGA-like, and we are exploring their function.</p> <p>This study demonstrates the power of tight collaboration between computational and experimental approaches in a combined effort to reveal the repertoire of ncRNA molecles.</p

    Designing an A ∗ Algorithm for Calculating Edit Distance between Rooted-Unordered Trees

    No full text
    Tree structures are useful for describing and analyzing biological objects and processes. Consequently, there is a need to design metrics and algorithms to compare trees. A natural comparison metric is the “Tree Edit Distance, ” the number of simple edit (insert/delete) operations needed to transform one tree into the other. Rooted-ordered trees, where the order between the siblings is significant, can be compared in polynomial time. Rooted-unordered trees are used to describe processes or objects where the topology, rather than the order or the identity of each node, is important. For example, in immunology, rooted-unordered trees describe the process of immunoglobulin (antibody) gene diversification in the germinal center over time. Comparing such trees has been proven to be a difficult computational problem that belongs to the set of NP-Complete problems. Comparing two trees can be viewed as a search problem in graphs. A ∗ is a search algorithm that explores the search space in an efficient order. Using a good lower bound estimation of the degree of difference between the two trees, A ∗ can reduce search time dramatically. We have designed and implemented a variant of the A ∗ search algorithm suitable for calculating tree edit distance. We show here that A ∗ is able to perform an edit distance measurement in reasonable time for trees with dozens of nodes. Key words: A ∗ , tree edit distance, lineage trees, rooted-unordered trees

    RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules-5

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules"</p><p>http://www.biomedcentral.com/1471-2105/8/366</p><p>BMC Bioinformatics 2007;8():366-366.</p><p>Published online 1 Oct 2007</p><p>PMCID:PMC2147038.</p><p></p>hat was used in Table 1. The MCC score is ranked from worst (left) to best (right). The lower (blue) line shows the value achieved by each one of the 120 permutations. The green line shows the increase in accuracy when the best of five different orders was reported. The upper (red) line shows the results where the path was ranked by using the Sum-of-Pairs approach i.e. summing the comparisons between all the pairs that comprise the path. These results clearly show that using the Sum-of-Pairs measure yields better predictions

    RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules-9

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules"</p><p>http://www.biomedcentral.com/1471-2105/8/366</p><p>BMC Bioinformatics 2007;8():366-366.</p><p>Published online 1 Oct 2007</p><p>PMCID:PMC2147038.</p><p></p>. Note that RNAspa was run in Boltzmann sampling mode across all sequence lengths. All but RNAspa failed to run on sequences greater than 450 bps due to memory constraints. StemLoc does not appear in the graph because it failed to process sequences of 100 bps or more. As expected, a cubic trendline (not shown) fits RNAspa's curve with the Rvalue of 0.9965. RNAspa gave a MCC score of 0.34 for the complete ~1,800 bps long SSU family

    RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules-1

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules"</p><p>http://www.biomedcentral.com/1471-2105/8/366</p><p>BMC Bioinformatics 2007;8():366-366.</p><p>Published online 1 Oct 2007</p><p>PMCID:PMC2147038.</p><p></p>down traversal, each node is assigned the shortest path from the top layer to itself. Next, the node with the lowest score on the lowest level is found, and the shortest path is retrieved. Bottom: The process of finding the shortest path reiterates several times. Each time, a different order permutation of the sequences is used. For each shortest path, a Sum-of-Pairs score is calculated. The shortest path with the best Sum-of-Pairs score is returned. In the illustration above, the third shortest path, which is not the shortest of the four paths, is returned because it has the best Sum-of-Pairs

    RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules-4

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules"</p><p>http://www.biomedcentral.com/1471-2105/8/366</p><p>BMC Bioinformatics 2007;8():366-366.</p><p>Published online 1 Oct 2007</p><p>PMCID:PMC2147038.</p><p></p>C scores are sorted from left to right over the 120 permutations for each family. One can see that the scores increase gradually and that most permutations yield similar results
    corecore