    Algorithms to Compute the Lyndon Array

    We first describe three algorithms for computing the Lyndon array that have been suggested in the literature, but for which no structured exposition has been given. Two of these algorithms execute in quadratic time in the worst case, the third achieves linear time, but at the expense of prior computation of both the suffix array and the inverse suffix array of x. We then go on to describe two variants of a new algorithm that avoids prior computation of global data structures and executes in worst-case n log n time. Experimental evidence suggests that all but one of these five algorithms require only linear execution time in practice, with the two new algorithms faster by a small factor. We conjecture that there exists a fast and worst-case linear-time algorithm to compute the Lyndon array that is also elementary (making no use of global data structures such as the suffix array)

    Lyndon Array Construction during Burrows-Wheeler Inversion

    In this paper we present an algorithm to compute the Lyndon array of a string TT of length nn as a byproduct of the inversion of the Burrows-Wheeler transform of TT. Our algorithm runs in linear time using only a stack in addition to the data structures used for Burrows-Wheeler inversion. We compare our algorithm with two other linear-time algorithms for Lyndon array construction and show that computing the Burrows-Wheeler transform and then constructing the Lyndon array is competitive compared to the known approaches. We also propose a new balanced parenthesis representation for the Lyndon array that uses 2n+o(n)2n+o(n) bits of space and supports constant time access. This representation can be built in linear time using O(n)O(n) words of space, or in O(nlog⁥n/log⁥log⁥n)O(n\log n/\log\log n) time using asymptotically the same space as TT

    Longest Lyndon Substring After Edit

    The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T\u27) where T\u27 is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T\u27) in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T\u27) in O(l log sigma + log n) time for any block edit where sigma is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T\u27 for both problems

    Inducing the Lyndon Array

    In this paper we propose a variant of the induced suffix sorting algorithm by Nong (TOIS, 2013) that computes simultaneously the Lyndon array and the suffix array of a text in O(n) time using O(n) words of working space, where n is the length of the text and is the alphabet size. Our result improves the previous best space requirement for linear time computation of the Lyndon array. In fact, all the known linear algorithms for Lyndon array computation use suffix sorting as a preprocessing step and use O(n) words of working space in addition to the Lyndon array and suffix array. Experimental results with real and synthetic datasets show that our algorithm is not only space-efficient but also fast in practice

    Lyndon Arrays Simplified

    A Lyndon word is a string that is lexicographically smaller than all of its proper suffixes (e.g., "airbus" is a Lyndon word; "amtrak" is not a Lyndon word because its suffix "ak" is lexicographically smaller than "amtrak"). The Lyndon array (sometimes called Lyndon table) identifies the longest Lyndon prefix of each suffix of a string. It is well known that the Lyndon array of a length-n string can be computed in O(n) time. However, most of the existing algorithms require the suffix array, which has theoretical and practical disadvantages. The only known algorithms that compute the Lyndon array in O(n) time without the suffix array (or similar data structures) do so in a particularly space efficient way (Bille et al., ICALP 2020), or in an online manner (Badkobeh et al., CPM 2022). Due to the additional goals of space efficiency and online computation, these algorithms are complicated in technical detail. Using the main ideas of the aforementioned algorithms, we provide a simpler and easier to understand algorithm that computes the Lyndon array in O(n) time

    Linear Time Runs Over General Ordered Alphabets

    A run in a string is a maximal periodic substring. For example, the string bananatree\texttt{bananatree} contains the runs anana=(an)3/2\texttt{anana} = (\texttt{an})^{3/2} and ee=e2\texttt{ee} = \texttt{e}^2. There are less than nn runs in any length-nn string, and computing all runs for a string over a linearly-sortable alphabet takes O(n)\mathcal{O}(n) time (Bannai et al., SODA 2015). Kosolobov conjectured that there also exists a linear time runs algorithm for general ordered alphabets (Inf. Process. Lett. 2016). The conjecture was almost proven by Crochemore et al., who presented an O(nα(n))\mathcal{O}(n\alpha(n)) time algorithm (where α(n)\alpha(n) is the extremely slowly growing inverse Ackermann function). We show how to achieve O(n)\mathcal{O}(n) time by exploiting combinatorial properties of the Lyndon array, thus proving Kosolobov's conjecture.Comment: This work has been submitted to ICALP 202

    Space Efficient Construction of Lyndon Arrays in Linear Time

