16 research outputs found
A Minimal Periods Algorithm with Applications
Kosaraju in ``Computation of squares in a string'' briefly described a
linear-time algorithm for computing the minimal squares starting at each
position in a word. Using the same construction of suffix trees, we generalize
his result and describe in detail how to compute in O(k|w|)-time the minimal
k-th power, with period of length larger than s, starting at each position in a
word w for arbitrary exponent and integer . We provide the
complete proof of correctness of the algorithm, which is somehow not completely
clear in Kosaraju's original paper. The algorithm can be used as a sub-routine
to detect certain types of pseudo-patterns in words, which is our original
intention to study the generalization.Comment: 14 page
NTRFinder: a software tool to find nested tandem repeats
We introduce the software tool NTRFinder to search for a complex repetitive structure in DNA we call a nested tandem repeat (NTR). An NTR is a recurrence of two or more distinct tandem motifs interspersed with each other. We propose that NTRs can be used as phylogenetic and population markers. We have tested our algorithm on both real and simulated data, and present some real NTRs of interest. NTRFinder can be downloaded from http://www.maths.otago.ac.nz/~aamatroud/
String matching problems over free partially commutative monoids
AbstractThis paper studies two string matching problems over free partially commutative monoids. We analyze these two problems in detail, and present two efficient polynomial time algorithms for solving them
Linear time algorithms for finding and representing all the tandem repeats in a string
Gusfield D, Stoye J. Linear time algorithms for finding and representing all the tandem repeats in a string. Journal of computer and system sciences. 2004;69(4):525-546.A tandem repeat (or square) is a string [alpha][alpha], where [alpha] is a non-empty string. We present an O(|S|)-time algorithm that operates on the suffix tree T(S) for a string S, finding and marking the endpoint in T(S) of every tandem repeat that occurs in S. This decorated suffix tree implicitly represents all occurrences of tandem repeats in S, and can be used to efficiently solve many questions concerning tandem repeats and tandem arrays in S. This improves and generalizes several prior efforts to efficiently capture large subsets of tandem repeats
ANĂLISE E APLICAĂĂO DE ESTRUTURAS DE SUFIXOS NA RESOLUĂĂO DO STRING MATCHING
String Matching Ă© o problema que busca responder a seguinte pergunta: âĂ possĂvel encontrar determinado padrĂŁo dentro de um texto?â. Ă um problema amplamente estudado na CiĂȘncia da Computação e tambĂ©m na Biologia Computacional, devido Ă existĂȘncia de suas diferentes modificaçÔes em ferramentas de pesquisa e tambĂ©m no processamento de cadeias de DNA. JĂĄ existem algoritmos que alcançaram a solução Ăłtima para responder a pergunta do problema, entretanto tais soluçÔes nĂŁo possuem a mesma eficiĂȘncia nas extensĂ”es e variaçÔes do problema. Dessa forma, diversas pesquisas tem estudado estruturas de dados relativas aos sufixos do texto para alcançar soluçÔes que sejam capazes de resolver variaçÔes complexas do string matching. O presente trabalho realiza um estudo e anĂĄlise aprofundada sobre a eficiĂȘncia de dessas estruturas: a ĂĄrvore de sufixos e o autĂŽmato de sufixos. Algoritmos clĂĄssicos tambĂ©m sĂŁo abordados e comparados Ă s estruturas enquanto o trabalho Ă© discorrido. As anĂĄlises seguem critĂ©rios estatĂsticos, tempos de execução e complexidade de algoritmos para obter maior grau de confiança nos resultados
ANĂLISE E APLICAĂĂO DE ESTRUTURAS DE SUFIXOS NA RESOLUĂĂO DO STRING MATCHING
String Matching Ă© o problema que busca responder a seguinte pergunta: âĂ possĂvel encontrar determinado padrĂŁo dentro de um texto?â. Ă um problema amplamente estudado na CiĂȘncia da Computação e tambĂ©m na Biologia Computacional, devido Ă existĂȘncia de suas diferentes modificaçÔes em ferramentas de pesquisa e tambĂ©m no processamento de cadeias de DNA. JĂĄ existem algoritmos que alcançaram a solução Ăłtima para responder a pergunta do problema, entretanto tais soluçÔes nĂŁo possuem a mesma eficiĂȘncia nas extensĂ”es e variaçÔes do problema. Dessa forma, diversas pesquisas tem estudado estruturas de dados relativas aos sufixos do texto para alcançar soluçÔes que sejam capazes de resolver variaçÔes complexas do string matching. O presente trabalho realiza um estudo e anĂĄlise aprofundada sobre a eficiĂȘncia de dessas estruturas: a ĂĄrvore de sufixos e o autĂŽmato de sufixos. Algoritmos clĂĄssicos tambĂ©m sĂŁo abordados e comparados Ă s estruturas enquanto o trabalho Ă© discorrido. As anĂĄlises seguem critĂ©rios estatĂsticos, tempos de execução e complexidade de algoritmos para obter maior grau de confiança nos resultados
Frequent Patterns Algorithm of Biological Sequences based on Pattern Prefix-tree
In the application of bioinformatics, the existing algorithms cannot be directly and efficiently implement sequence pattern mining. Two fast and efficient biological sequence pattern mining algorithms for biological single sequence and multiple sequences are proposed in this paper. The concept of the basic pattern is proposed, and on the basis of mining frequent basic patterns, the frequent pattern is excavated by constructing prefix trees for frequent basic patterns. The proposed algorithms implement rapid mining of frequent patterns of biological sequences based on pattern prefix trees. In experiment the family sequence data in the pfam protein database is used to verify the performance of the proposed algorithm. The prediction results confirm that the proposed algorithms canât only obtain the mining results with effective biological significance, but also improve the running time efficiency of the biological sequence pattern mining