15 research outputs found

    Computing Runs on a General Alphabet

    Full text link
    We describe a RAM algorithm computing all runs (maximal repetitions) of a given string of length nn over a general ordered alphabet in O(nlog23n)O(n\log^{\frac{2}3} n) time and linear space. Our algorithm outperforms all known solutions working in Θ(nlogσ)\Theta(n\log\sigma) time provided σ=nΩ(1)\sigma = n^{\Omega(1)}, where σ\sigma is the alphabet size. We conjecture that there exists a linear time RAM algorithm finding all runs.Comment: 4 pages, 2 figure

    Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries

    Get PDF
    Longest common extension queries (LCE queries) and runs are ubiquitous in algorithmic stringology. Linear-time algorithms computing runs and preprocessing for constant-time LCE queries have been known for over a decade. However, these algorithms assume a linearly-sortable integer alphabet. A recent breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the two notions: all the runs in a string can be computed via a linear number of LCE queries. The first to consider these problems over a general ordered alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an O(n(logn)2/3)O(n (\log n)^{2/3})-time algorithm for answering O(n)O(n) LCE queries. This result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to O(nloglogn)O(n \log \log n) time. In this work we note a special \emph{non-crossing} property of LCE queries asked in the runs computation. We show that any nn such non-crossing queries can be answered on-line in O(nα(n))O(n \alpha(n)) time, which yields an O(nα(n))O(n \alpha(n))-time algorithm for computing runs

    Faster Longest Common Extension Queries in Strings over General Alphabets

    Get PDF
    Longest common extension queries (often called longest common prefix queries) constitute a fundamental building block in multiple string algorithms, for example computing runs and approximate pattern matching. We show that a sequence of qq LCE queries for a string of size nn over a general ordered alphabet can be realized in O(qloglogn+nlogn)O(q \log \log n+n\log^*n) time making only O(q+n)O(q+n) symbol comparisons. Consequently, all runs in a string over a general ordered alphabet can be computed in O(nloglogn)O(n \log \log n) time making O(n)O(n) symbol comparisons. Our results improve upon a solution by Kosolobov (Information Processing Letters, 2016), who gave an algorithm with O(nlog2/3n)O(n \log^{2/3} n) running time and conjectured that O(n)O(n) time is possible. We make a significant progress towards resolving this conjecture. Our techniques extend to the case of general unordered alphabets, when the time increases to O(qlogn+nlogn)O(q\log n + n\log^*n). The main tools are difference covers and the disjoint-sets data structure.Comment: Accepted to CPM 201

    Finding the Leftmost Critical Factorization on Unordered Alphabet

    Full text link
    We present a linear time and space algorithm computing the leftmost critical factorization of a given string on an unordered alphabet.Comment: 13 pages, 13 figures (accepted to Theor. Comp. Sci.

    On the Size of Overlapping Lempel-Ziv and Lyndon Factorizations

    Get PDF

    Computing Runs on a Trie

    Get PDF
    A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with n edges, we show that the number of runs is less than n. We also show an O(n sqrt{log n}log log n) time and O(n) space algorithm for counting and finding the shallower endpoint of all runs. We further show an O(n log n) time and O(n) space algorithm for finding both endpoints of all runs. We also discuss how to improve the running time even more

    Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings

    Get PDF
    We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding. Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2. We also show an algorithm for computing all maximal repetitions in O(m alpha(m)) time and O(m) space, where alpha denotes the inverse Ackermann function

    Linear Time Runs Over General Ordered Alphabets

    Get PDF
    A run in a string is a maximal periodic substring. For example, the string bananatree\texttt{bananatree} contains the runs anana=(an)3/2\texttt{anana} = (\texttt{an})^{3/2} and ee=e2\texttt{ee} = \texttt{e}^2. There are less than nn runs in any length-nn string, and computing all runs for a string over a linearly-sortable alphabet takes O(n)\mathcal{O}(n) time (Bannai et al., SODA 2015). Kosolobov conjectured that there also exists a linear time runs algorithm for general ordered alphabets (Inf. Process. Lett. 2016). The conjecture was almost proven by Crochemore et al., who presented an O(nα(n))\mathcal{O}(n\alpha(n)) time algorithm (where α(n)\alpha(n) is the extremely slowly growing inverse Ackermann function). We show how to achieve O(n)\mathcal{O}(n) time by exploiting combinatorial properties of the Lyndon array, thus proving Kosolobov's conjecture.Comment: This work has been submitted to ICALP 202

    Lyndon Arrays in Sublinear Time

    Get PDF
    ?} with ? ? n. In this case, the string can be stored in O(n log ?) bits (or O(n / log_? n) words) of memory, and reading it takes only O(n / log_? n) time. We show that O(n / log_? n) time and words of space suffice to compute the succinct 2n-bit version of the Lyndon array. The time is optimal for w = O(log n). The algorithm uses precomputed lookup tables to perform significant parts of the computation in constant time. This is possible due to properties of periodic substrings, which we carefully analyze to achieve the desired result. We envision that the algorithm has applications in the computation of runs (maximal periodic substrings), where the Lyndon array plays a central role in both theoretically and practically fast algorithms
    corecore