7,355 research outputs found

    Lempel-Ziv Factorization May Be Harder Than Computing All Runs

    Get PDF
    The complexity of computing the Lempel-Ziv factorization and the set of all runs (= maximal repetitions) is studied in the decision tree model of computation over ordered alphabet. It is known that both these problems can be solved by RAM algorithms in O(nlogσ)O(n\log\sigma) time, where nn is the length of the input string and σ\sigma is the number of distinct letters in it. We prove an Ω(nlogσ)\Omega(n\log\sigma) lower bound on the number of comparisons required to construct the Lempel-Ziv factorization and thereby conclude that a popular technique of computation of runs using the Lempel-Ziv factorization cannot achieve an o(nlogσ)o(n\log\sigma) time bound. In contrast with this, we exhibit an O(n)O(n) decision tree algorithm finding all runs in a string. Therefore, in the decision tree model the runs problem is easier than the Lempel-Ziv factorization. Thus we support the conjecture that there is a linear RAM algorithm finding all runs.Comment: 12 pages, 3 figures, submitte

    Computing Runs on a General Alphabet

    Full text link
    We describe a RAM algorithm computing all runs (maximal repetitions) of a given string of length nn over a general ordered alphabet in O(nlog23n)O(n\log^{\frac{2}3} n) time and linear space. Our algorithm outperforms all known solutions working in Θ(nlogσ)\Theta(n\log\sigma) time provided σ=nΩ(1)\sigma = n^{\Omega(1)}, where σ\sigma is the alphabet size. We conjecture that there exists a linear time RAM algorithm finding all runs.Comment: 4 pages, 2 figure

    Searching of gapped repeats and subrepetitions in a word

    Full text link
    A gapped repeat is a factor of the form uvuuvu where uu and vv are nonempty words. The period of the gapped repeat is defined as u+v|u|+|v|. The gapped repeat is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its period. The gapped repeat is called α\alpha-gapped if its period is not greater than αv\alpha |v|. A δ\delta-subrepetition is a factor which exponent is less than 2 but is not less than 1+δ1+\delta (the exponent of the factor is the quotient of the length and the minimal period of the factor). The δ\delta-subrepetition is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its minimal period. We reveal a close relation between maximal gapped repeats and maximal subrepetitions. Moreover, we show that in a word of length nn the number of maximal α\alpha-gapped repeats is bounded by O(α2n)O(\alpha^2n) and the number of maximal δ\delta-subrepetitions is bounded by O(n/δ2)O(n/\delta^2). Using the obtained upper bounds, we propose algorithms for finding all maximal α\alpha-gapped repeats and all maximal δ\delta-subrepetitions in a word of length nn. The algorithm for finding all maximal α\alpha-gapped repeats has O(α2n)O(\alpha^2n) time complexity for the case of constant alphabet size and O(nlogn+α2n)O(n\log n + \alpha^2n) time complexity for the general case. For finding all maximal δ\delta-subrepetitions we propose two algorithms. The first algorithm has O(nloglognδ2)O(\frac{n\log\log n}{\delta^2}) time complexity for the case of constant alphabet size and O(nlogn+nloglognδ2)O(n\log n +\frac{n\log\log n}{\delta^2}) time complexity for the general case. The second algorithm has O(nlogn+nδ2log1δ)O(n\log n+\frac{n}{\delta^2}\log \frac{1}{\delta}) expected time complexity

    A Minimal Periods Algorithm with Applications

    Full text link
    Kosaraju in ``Computation of squares in a string'' briefly described a linear-time algorithm for computing the minimal squares starting at each position in a word. Using the same construction of suffix trees, we generalize his result and describe in detail how to compute in O(k|w|)-time the minimal k-th power, with period of length larger than s, starting at each position in a word w for arbitrary exponent k2k\geq2 and integer s0s\geq0. We provide the complete proof of correctness of the algorithm, which is somehow not completely clear in Kosaraju's original paper. The algorithm can be used as a sub-routine to detect certain types of pseudo-patterns in words, which is our original intention to study the generalization.Comment: 14 page

    Detecting One-variable Patterns

    Full text link
    Given a pattern p=s1x1s2x2sr1xr1srp = s_1x_1s_2x_2\cdots s_{r-1}x_{r-1}s_r such that x1,x2,,xr1{x,x}x_1,x_2,\ldots,x_{r-1}\in\{x,\overset{{}_{\leftarrow}}{x}\}, where xx is a variable and x\overset{{}_{\leftarrow}}{x} its reversal, and s1,s2,,srs_1,s_2,\ldots,s_r are strings that contain no variables, we describe an algorithm that constructs in O(rn)O(rn) time a compact representation of all PP instances of pp in an input string of length nn over a polynomially bounded integer alphabet, so that one can report those instances in O(P)O(P) time.Comment: 16 pages (+13 pages of Appendix), 4 figures, accepted to SPIRE 201

    Finding the Leftmost Critical Factorization on Unordered Alphabet

    Full text link
    We present a linear time and space algorithm computing the leftmost critical factorization of a given string on an unordered alphabet.Comment: 13 pages, 13 figures (accepted to Theor. Comp. Sci.
    corecore