171 research outputs found

    On the maximal number of cubic subwords in a string

    Full text link
    We investigate the problem of the maximum number of cubic subwords (of the form wwwwww) in a given word. We also consider square subwords (of the form wwww). The problem of the maximum number of squares in a word is not well understood. Several new results related to this problem are produced in the paper. We consider two simple problems related to the maximum number of subwords which are squares or which are highly repetitive; then we provide a nontrivial estimation for the number of cubes. We show that the maximum number of squares xxxx such that xx is not a primitive word (nonprimitive squares) in a word of length nn is exactly n21\lfloor \frac{n}{2}\rfloor - 1, and the maximum number of subwords of the form xkx^k, for k3k\ge 3, is exactly n2n-2. In particular, the maximum number of cubes in a word is not greater than n2n-2 either. Using very technical properties of occurrences of cubes, we improve this bound significantly. We show that the maximum number of cubes in a word of length nn is between (1/2)n(1/2)n and (4/5)n(4/5)n. (In particular, we improve the lower bound from the conference version of the paper.)Comment: 14 page

    Fast Label Extraction in the CDAWG

    Full text link
    The compact directed acyclic word graph (CDAWG) of a string TT of length nn takes space proportional just to the number ee of right extensions of the maximal repeats of TT, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which ee grows significantly more slowly than nn. We reduce from O(mloglogn)O(m\log{\log{n}}) to O(m)O(m) the time needed to count the number of occurrences of a pattern of length mm, using an existing data structure that takes an amount of space proportional to the size of the CDAWG. This implies a reduction from O(mloglogn+occ)O(m\log{\log{n}}+\mathtt{occ}) to O(m+occ)O(m+\mathtt{occ}) in the time needed to locate all the occ\mathtt{occ} occurrences of the pattern. We also reduce from O(kloglogn)O(k\log{\log{n}}) to O(k)O(k) the time needed to read the kk characters of the label of an edge of the suffix tree of TT, and we reduce from O(mloglogn)O(m\log{\log{n}}) to O(m)O(m) the time needed to compute the matching statistics between a query of length mm and TT, using an existing representation of the suffix tree based on the CDAWG. All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.Comment: 16 pages, 1 figure. In proceedings of the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017). arXiv admin note: text overlap with arXiv:1705.0864

    A simple algorithm for computing the Lempel-Ziv factorization

    Get PDF
    We give a space-efficient simple algorithm for computing the Lempel?Ziv factorization ofa string. For a string of length n over an integer alphabet, it runs in O(n) time independentlyof alphabet size and uses o(n) additional space

    A Minimal Periods Algorithm with Applications

    Full text link
    Kosaraju in ``Computation of squares in a string'' briefly described a linear-time algorithm for computing the minimal squares starting at each position in a word. Using the same construction of suffix trees, we generalize his result and describe in detail how to compute in O(k|w|)-time the minimal k-th power, with period of length larger than s, starting at each position in a word w for arbitrary exponent k2k\geq2 and integer s0s\geq0. We provide the complete proof of correctness of the algorithm, which is somehow not completely clear in Kosaraju's original paper. The algorithm can be used as a sub-routine to detect certain types of pseudo-patterns in words, which is our original intention to study the generalization.Comment: 14 page

    Efficient Enumeration of Non-Equivalent Squares in Partial Words with Few Holes

    Get PDF
    International audienceA partial word is a word with holes (also called don't cares: special symbols which match any symbol). A p-square is a partial word matching at least one standard square without holes (called a full square). Two p-squares are called equivalent if they match the same sets of full squares. Denote by psquares(T) the number of non-equivalent p-squares which are subwords of a partial word T. Let PSQUARES k (n) be the maximum value of psquares(T) over all partial words of length n with k holes. We show asympthotically tight bounds: c1 · min(nk 2 , n 2) ≤ PSQUARES k (n) ≤ c2 · min(nk 2 , n 2) for some constants c1, c2 > 0. We also present an algorithm that computes psquares(T) in O(nk 3) time for a partial word T of length n with k holes. In particular, our algorithm runs in linear time for k = O(1) and its time complexity near-matches the maximum number of non-equivalent p-squares

    Palindromic complexity of trees

    Full text link
    We consider finite trees with edges labeled by letters on a finite alphabet Σ\varSigma. Each pair of nodes defines a unique labeled path whose trace is a word of the free monoid Σ\varSigma^*. The set of all such words defines the language of the tree. In this paper, we investigate the palindromic complexity of trees and provide hints for an upper bound on the number of distinct palindromes in the language of a tree.Comment: Submitted to the conference DLT201
    corecore