Algorithms for Motif Discovery based on Edit Distance ⋆

Abstract

Abstract. In this paper, we study the problem of identifying sequence patterns of length l in a database DB, consisting of n bio-sequences of average length m each, that have occurrences in at least t distinct sequences of DB, the occurrences being at an edit distance (also called the Levenshtein distance) of at most d from the pattern. We survey some algorithms for the problem from the literature and also present two improved algorithms for the same. An implementation and performance results of one of our algorithms is also presented.

    Similar works

    Full text

    thumbnail-image

    Available Versions