unknown

Linear time algorithms for finding and representing all the tandem repeats in a string

Abstract

Gusfield D, Stoye J. Linear time algorithms for finding and representing all the tandem repeats in a string. Journal of computer and system sciences. 2004;69(4):525-546.A tandem repeat (or square) is a string [alpha][alpha], where [alpha] is a non-empty string. We present an O(|S|)-time algorithm that operates on the suffix tree T(S) for a string S, finding and marking the endpoint in T(S) of every tandem repeat that occurs in S. This decorated suffix tree implicitly represents all occurrences of tandem repeats in S, and can be used to efficiently solve many questions concerning tandem repeats and tandem arrays in S. This improves and generalizes several prior efforts to efficiently capture large subsets of tandem repeats

    Similar works