2 research outputs found

    Fast search of sequences with complex symbol correlations using profile context-sensitive HMMS and pre-screening filters

    Get PDF
    Recently, profile context-sensitive HMMs (profile-csHMMs) have been proposed which are very effective in modeling the common patterns and motifs in related symbol sequences. Profile-csHMMs are capable of representing long-range correlations between distant symbols, even when these correlations are entangled in a complicated manner. This makes profile-csHMMs an useful tool in computational biology, especially in modeling noncoding RNAs (ncRNAs) and finding new ncRNA genes. However, a profile-csHMM based search is quite slow, hence not practical for searching a large database. In this paper, we propose a practical scheme for making the search speed significantly faster without any degradation in the prediction accuracy. The proposed method utilizes a pre-screening filter based on a profile-HMM, which filters out most sequences that will not be predicted as a match by the original profile-csHMM. Experimental results show that the proposed approach can make the search speed eighty times faster

    Fast Structural Similarity Search of Noncoding RNAs Based on Matched Filtering of Stem Patterns

    Get PDF
    Many noncoding RNAs (ncRNAs) have characteristic secondary structures that give rise to complicated base correlations in their primary sequences. Therefore, when performing an RNA similarity search to find new members of a ncRNA family, we need a statistical model - such as the profile- csHMM or the covariance model (CM) - that can effectively describe the correlations between distant bases. However, these models are computationally expensive, making the resulting RNA search very slow. To overcome this problem, various prescreening methods have been proposed that first use a simpler model to scan the database and filter out the dissimilar regions. Only the remaining regions that bear some similarity are passed to a more complex model for closer inspection. It has been shown that the prescreening approach can make the search speed significantly faster at no (or a slight) loss of prediction accuracy. In this paper, we propose a novel prescreening method based on matched filtering of stem patterns. Unlike many existing methods, the proposed method can prescreen the database solely based on structural similarity. The proposed method can handle RNAs with arbitrary secondary structures, and it can be easily incorporated into various search methods that use different statistical models. Furthermore, the proposed approach has a low computational cost, yet very effective for prescreening, as will be demonstrated in the paper
    corecore