3 research outputs found

    Covariance Searches for ncRNA Gene Finding

    Get PDF
    The use of covariance models for non-coding RNA gene finding is extremely powerful and also extremely computationally demanding. A major reason for the high computational burden of this algorithm is that the search proceeds through every possible start position in the database and every possible sequence length between zero and a user-defined maximum length at every one of these start positions. Furthermore, for every start position and sequence length, all possible combinations of insertions and deletions leading to the given sequence length are searched. It has been previously shown that a large portion of this search space is nowhere near any database match observed in practice and that the search space can be limited significantly with little change in expected search results. In this work a different approach is taken in which the space of starting positions, sequence lengths, and insertion/deletion patterns is searched using a genetic algorithm

    Covariance Searches for ncRNA Gene Finding

    Full text link

    Acceleration of Covariance Models for Non-coding RNA Search

    No full text
    based models for non-coding RNA (ncRNA) gene searches are much more powerful than regular grammar based models due to the ability to model intermolecular base pairing. The SCFG models (also known as covariance models) can be scored exactly using dynamic programming techniques. However, the computational resources needed to compute optimal scores using dynamic programming is too great for most applications. Pre-filtering of the database using regular grammar based models can lead to significant improvements in performance at little or no cost in terms of specificity or sensitivity. While pre-filtering is a major improvement, the algorithm is still way to slow. The use of an alternative search strategy for high scoring subsequences in the sequence database is explored in this paper. Rather than sequentially computing the best score at each database position and subsequence length as is done in the dynamic programming method, good suboptimal scores are found throughout the position and length search space and the search is expanded about these trial solutions. I
    corecore