16 research outputs found

    A Minimal Periods Algorithm with Applications

    Full text link
    Kosaraju in ``Computation of squares in a string'' briefly described a linear-time algorithm for computing the minimal squares starting at each position in a word. Using the same construction of suffix trees, we generalize his result and describe in detail how to compute in O(k|w|)-time the minimal k-th power, with period of length larger than s, starting at each position in a word w for arbitrary exponent k≄2k\geq2 and integer s≄0s\geq0. We provide the complete proof of correctness of the algorithm, which is somehow not completely clear in Kosaraju's original paper. The algorithm can be used as a sub-routine to detect certain types of pseudo-patterns in words, which is our original intention to study the generalization.Comment: 14 page

    NTRFinder: a software tool to find nested tandem repeats

    Get PDF
    We introduce the software tool NTRFinder to search for a complex repetitive structure in DNA we call a nested tandem repeat (NTR). An NTR is a recurrence of two or more distinct tandem motifs interspersed with each other. We propose that NTRs can be used as phylogenetic and population markers. We have tested our algorithm on both real and simulated data, and present some real NTRs of interest. NTRFinder can be downloaded from http://www.maths.otago.ac.nz/~aamatroud/

    String matching problems over free partially commutative monoids

    Get PDF
    AbstractThis paper studies two string matching problems over free partially commutative monoids. We analyze these two problems in detail, and present two efficient polynomial time algorithms for solving them

    Linear time algorithms for finding and representing all the tandem repeats in a string

    Get PDF
    Gusfield D, Stoye J. Linear time algorithms for finding and representing all the tandem repeats in a string. Journal of computer and system sciences. 2004;69(4):525-546.A tandem repeat (or square) is a string [alpha][alpha], where [alpha] is a non-empty string. We present an O(|S|)-time algorithm that operates on the suffix tree T(S) for a string S, finding and marking the endpoint in T(S) of every tandem repeat that occurs in S. This decorated suffix tree implicitly represents all occurrences of tandem repeats in S, and can be used to efficiently solve many questions concerning tandem repeats and tandem arrays in S. This improves and generalizes several prior efforts to efficiently capture large subsets of tandem repeats

    ANÁLISE E APLICAÇÃO DE ESTRUTURAS DE SUFIXOS NA RESOLUÇÃO DO STRING MATCHING

    Get PDF
    String Matching Ă© o problema que busca responder a seguinte pergunta: “É possĂ­vel encontrar determinado padrĂŁo dentro de um texto?”. É um problema amplamente estudado na CiĂȘncia da Computação e tambĂ©m na Biologia Computacional, devido Ă  existĂȘncia de suas diferentes modificaçÔes em ferramentas de pesquisa e tambĂ©m no processamento de cadeias de DNA. JĂĄ existem algoritmos que alcançaram a solução Ăłtima para responder a pergunta do problema, entretanto tais soluçÔes nĂŁo possuem a mesma eficiĂȘncia nas extensĂ”es e variaçÔes do problema. Dessa forma, diversas pesquisas tem estudado estruturas de dados relativas aos sufixos do texto para alcançar soluçÔes que sejam capazes de resolver variaçÔes complexas do string matching. O presente trabalho realiza um estudo e anĂĄlise aprofundada sobre a eficiĂȘncia de dessas estruturas: a ĂĄrvore de sufixos e o autĂŽmato de sufixos. Algoritmos clĂĄssicos tambĂ©m sĂŁo abordados e comparados Ă s estruturas enquanto o trabalho Ă© discorrido. As anĂĄlises seguem critĂ©rios estatĂ­sticos, tempos de execução e complexidade de algoritmos para obter maior grau de confiança nos resultados

    ANÁLISE E APLICAÇÃO DE ESTRUTURAS DE SUFIXOS NA RESOLUÇÃO DO STRING MATCHING

    Get PDF
    String Matching Ă© o problema que busca responder a seguinte pergunta: “É possĂ­vel encontrar determinado padrĂŁo dentro de um texto?”. É um problema amplamente estudado na CiĂȘncia da Computação e tambĂ©m na Biologia Computacional, devido Ă  existĂȘncia de suas diferentes modificaçÔes em ferramentas de pesquisa e tambĂ©m no processamento de cadeias de DNA. JĂĄ existem algoritmos que alcançaram a solução Ăłtima para responder a pergunta do problema, entretanto tais soluçÔes nĂŁo possuem a mesma eficiĂȘncia nas extensĂ”es e variaçÔes do problema. Dessa forma, diversas pesquisas tem estudado estruturas de dados relativas aos sufixos do texto para alcançar soluçÔes que sejam capazes de resolver variaçÔes complexas do string matching. O presente trabalho realiza um estudo e anĂĄlise aprofundada sobre a eficiĂȘncia de dessas estruturas: a ĂĄrvore de sufixos e o autĂŽmato de sufixos. Algoritmos clĂĄssicos tambĂ©m sĂŁo abordados e comparados Ă s estruturas enquanto o trabalho Ă© discorrido. As anĂĄlises seguem critĂ©rios estatĂ­sticos, tempos de execução e complexidade de algoritmos para obter maior grau de confiança nos resultados

    Frequent Patterns Algorithm of Biological Sequences based on Pattern Prefix-tree

    Get PDF
    In the application of bioinformatics, the existing algorithms cannot be directly and efficiently implement sequence pattern mining. Two fast and efficient biological sequence pattern mining algorithms for biological single sequence and multiple sequences are proposed in this paper. The concept of the basic pattern is proposed, and on the basis of mining frequent basic patterns, the frequent pattern is excavated by constructing prefix trees for frequent basic patterns. The proposed algorithms implement rapid mining of frequent patterns of biological sequences based on pattern prefix trees. In experiment the family sequence data in the pfam protein database is used to verify the performance of the proposed algorithm. The prediction results confirm that the proposed algorithms can’t only obtain the mining results with effective biological significance, but also improve the running time efficiency of the biological sequence pattern mining

    An Optimal O(log log n) Time Parallel Algorithm for Detecting all Squares in a String

    Full text link
    corecore