An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

A Harmanci; A Harmanci; A Taneda; A Uzilov; A Uzilov; A Wilm; Akito Taneda; B Knudsen; C Lu; C Notredame; C Notredame; C Selig; CC Chang; CMA Davis Jr; D Dalli; D Dalli; D Rose; D Sankoff; DE Goldberg; E Rivas; E Rivas; E Torarinsson; E Torarinsson; E Torarinsson; F Miura; G Gonsalvez; H Kiryu; H Kiryu; I Hofacker; I Hofacker; I Holmes; J Cherry; J Gorodkin; J Havgaard; J Havgaard; J Pedersen; J Schultz; J Thompson; J Thompson; K Katoh; K Missal; K Missal; L David; M Bauer; M Gerstein; M Samanta; P Carninci; R Dowell; R Klein; R Nussinov; S Needleman; S Washietl; S Washietl; S Washietl; S Will; W Gish; X Xu; Y Tabei

An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

Abstract

Abstract Background Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA) discovery. Results We developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared <it>S. cerevisiae </it>genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp) sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%). By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences. Conclusion The present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.</p

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Springer - Publisher Connector

Last time updated on 01/05/2017

Crossref

Last time updated on 19/02/2019

Springer - Publisher Connector

Last time updated on 05/06/2019

Directory of Open Access Journals

oai:doaj.org/article:30febc6e3...

Last time updated on 18/12/2014