157 research outputs found

    Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries

    Get PDF
    Longest common extension queries (LCE queries) and runs are ubiquitous in algorithmic stringology. Linear-time algorithms computing runs and preprocessing for constant-time LCE queries have been known for over a decade. However, these algorithms assume a linearly-sortable integer alphabet. A recent breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the two notions: all the runs in a string can be computed via a linear number of LCE queries. The first to consider these problems over a general ordered alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an O(n(logn)2/3)O(n (\log n)^{2/3})-time algorithm for answering O(n)O(n) LCE queries. This result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to O(nloglogn)O(n \log \log n) time. In this work we note a special \emph{non-crossing} property of LCE queries asked in the runs computation. We show that any nn such non-crossing queries can be answered on-line in O(nα(n))O(n \alpha(n)) time, which yields an O(nα(n))O(n \alpha(n))-time algorithm for computing runs

    Minimal Suffix and Rotation of a Substring in Optimal Time

    Get PDF
    For a text given in advance, the substring minimal suffix queries ask to determine the lexicographically minimal non-empty suffix of a substring specified by the location of its occurrence in the text. We develop a data structure answering such queries optimally: in constant time after linear-time preprocessing. This improves upon the results of Babenko et al. (CPM 2014), whose trade-off solution is characterized by Θ(nlogn)\Theta(n\log n) product of these time complexities. Next, we extend our queries to support concatenations of O(1)O(1) substrings, for which the construction and query time is preserved. We apply these generalized queries to compute lexicographically minimal and maximal rotations of a given substring in constant time after linear-time preprocessing. Our data structures mainly rely on properties of Lyndon words and Lyndon factorizations. We combine them with further algorithmic and combinatorial tools, such as fusion trees and the notion of order isomorphism of strings

    Lyndon Array Construction during Burrows-Wheeler Inversion

    Get PDF
    In this paper we present an algorithm to compute the Lyndon array of a string TT of length nn as a byproduct of the inversion of the Burrows-Wheeler transform of TT. Our algorithm runs in linear time using only a stack in addition to the data structures used for Burrows-Wheeler inversion. We compare our algorithm with two other linear-time algorithms for Lyndon array construction and show that computing the Burrows-Wheeler transform and then constructing the Lyndon array is competitive compared to the known approaches. We also propose a new balanced parenthesis representation for the Lyndon array that uses 2n+o(n)2n+o(n) bits of space and supports constant time access. This representation can be built in linear time using O(n)O(n) words of space, or in O(nlogn/loglogn)O(n\log n/\log\log n) time using asymptotically the same space as TT

    Internal Pattern Matching Queries in a Text and Applications

    Full text link
    We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to queries about occurrences of one subword xx in another subword yy of a given text, assuming that y=O(x)|y|=\mathcal{O}(|x|), which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding δ\delta-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed δ\delta we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201

    Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings

    Get PDF
    We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding. Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2. We also show an algorithm for computing all maximal repetitions in O(m alpha(m)) time and O(m) space, where alpha denotes the inverse Ackermann function

    Fast Computation of Abelian Runs

    Full text link
    Given a word ww and a Parikh vector P\mathcal{P}, an abelian run of period P\mathcal{P} in ww is a maximal occurrence of a substring of ww having abelian period P\mathcal{P}. Our main result is an online algorithm that, given a word ww of length nn over an alphabet of cardinality σ\sigma and a Parikh vector P\mathcal{P}, returns all the abelian runs of period P\mathcal{P} in ww in time O(n)O(n) and space O(σ+p)O(\sigma+p), where pp is the norm of P\mathcal{P}, i.e., the sum of its components. We also present an online algorithm that computes all the abelian runs with periods of norm pp in ww in time O(np)O(np), for any given norm pp. Finally, we give an O(n2)O(n^2)-time offline randomized algorithm for computing all the abelian runs of ww. Its deterministic counterpart runs in O(n2logσ)O(n^2\log\sigma) time.Comment: To appear in Theoretical Computer Scienc
    corecore