469 research outputs found
Sorting suffixes of a text via its Lyndon Factorization
The process of sorting the suffixes of a text plays a fundamental role in
Text Algorithms. They are used for instance in the constructions of the
Burrows-Wheeler transform and the suffix array, widely used in several fields
of Computer Science. For this reason, several recent researches have been
devoted to finding new strategies to obtain effective methods for such a
sorting. In this paper we introduce a new methodology in which an important
role is played by the Lyndon factorization, so that the local suffixes inside
factors detected by this factorization keep their mutual order when extended to
the suffixes of the whole word. This property suggests a versatile technique
that easily can be adapted to different implementative scenarios.Comment: Submitted to the Prague Stringology Conference 2013 (PSC 2013
Minimal Suffix and Rotation of a Substring in Optimal Time
For a text given in advance, the substring minimal suffix queries ask to
determine the lexicographically minimal non-empty suffix of a substring
specified by the location of its occurrence in the text. We develop a data
structure answering such queries optimally: in constant time after linear-time
preprocessing. This improves upon the results of Babenko et al. (CPM 2014),
whose trade-off solution is characterized by product of these
time complexities. Next, we extend our queries to support concatenations of
substrings, for which the construction and query time is preserved. We
apply these generalized queries to compute lexicographically minimal and
maximal rotations of a given substring in constant time after linear-time
preprocessing.
Our data structures mainly rely on properties of Lyndon words and Lyndon
factorizations. We combine them with further algorithmic and combinatorial
tools, such as fusion trees and the notion of order isomorphism of strings
Evaluation of a Permutation-Based Evolutionary Framework for Lyndon Factorizations
String factorization is an important tool for partitioning data for parallel processing and other algorithmic techniques often found in the context of big data applications such as bioinformatics or compression. Duval’s well-known algorithm uniquely factors a string over an ordered alphabet into Lyndon words, i.e., patterned strings which arestrictly smaller than all of their cyclic rotations. While Duval’s algorithm produces a pre-determined factorization, modern applications motivate the demand for factorizations with specific properties, e.g., those that minimize the number of factors or consist of factors with similar lengths. In this paper, we consider the problem of finding an alphabet ordering that yields a Lyndon factorization with such properties. We introduce a flexible evolutionary framework and evaluate it on biological sequence data. For the minimization case, we also propose a new problem-specific heuristic, Flexi-Duval, and a problem-specific mutation operator for Lyndon factorization. Our results show that our framework is competitive with Flexi-Duval for minimization and yields high quality and robust solutions for balancing where no problem-specific algorithm is available
Fast Computation of Abelian Runs
Given a word and a Parikh vector , an abelian run of period
in is a maximal occurrence of a substring of having
abelian period . Our main result is an online algorithm that,
given a word of length over an alphabet of cardinality and a
Parikh vector , returns all the abelian runs of period
in in time and space , where is the
norm of , i.e., the sum of its components. We also present an
online algorithm that computes all the abelian runs with periods of norm in
in time , for any given norm . Finally, we give an -time
offline randomized algorithm for computing all the abelian runs of . Its
deterministic counterpart runs in time.Comment: To appear in Theoretical Computer Scienc
Longest Lyndon Substring After Edit
The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T\u27) where T\u27 is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T\u27) in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T\u27) in O(l log sigma + log n) time for any block edit where sigma is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T\u27 for both problems
- …