Search CORE

239 research outputs found

On Bijective Variants of the Burrows-Wheeler Transform

Author: Kufleitner Manfred
Publication venue
Publication date: 01/01/2009
Field of study

The sort transform (ST) is a modification of the Burrows-Wheeler transform (BWT). Both transformations map an arbitrary word of length n to a pair consisting of a word of length n and an index between 1 and n. The BWT sorts all rotation conjugates of the input word, whereas the ST of order k only uses the first k letters for sorting all such conjugates. If two conjugates start with the same prefix of length k, then the indices of the rotations are used for tie-breaking. Both transforms output the sequence of the last letters of the sorted list and the index of the input within the sorted list. In this paper, we discuss a bijective variant of the BWT (due to Scott), proving its correctness and relations to other results due to Gessel and Reutenauer (1993) and Crochemore, Desarmenien, and Perrin (2005). Further, we present a novel bijective variant of the ST.Comment: 15 pages, presented at the Prague Stringology Conference 2009 (PSC 2009

arXiv.org e-Print Archive

CiteSeerX

Space Efficient Construction of Lyndon Arrays in Linear Time

Author: Bille Philip
Ellert Jonas
Fischer Johannes
Kurpicz Florian
Munro J. Ian
Rotenberg Eva
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)
Publication date: 01/01/2020
Field of study

Given a string S of length n, its Lyndon array identifies for each suffix S[i..n] the next lexicographically smaller suffix S[j..n], i.e. the minimal index j > i with S[i..n] ? S[j..n]. Apart from its plain (n log? n)-bit array representation, the Lyndon array can also be encoded as a succinct parentheses sequence that requires only 2n bits of space. While linear time construction algorithms for both representations exist, it has previously been unknown if the same time bound can be achieved with less than ?(n lg n) bits of additional working space. We show that, in fact, o(n) additional bits are sufficient to compute the succinct 2n-bit version of the Lyndon array in linear time. For the plain (n log? n)-bit version, we only need ?(1) additional words to achieve linear time. Our space efficient construction algorithm makes the Lyndon array more accessible as a fundamental data structure in applications like full-text indexing

KITopen

Dagstuhl Research Online Publication Server

Online Research Database In Technology

Space Efficient Construction of Lyndon Arrays in Linear Time

Author: Bille Philip
Ellert Jonas
Fischer Johannes
Gørtz Inge Li
Kurpicz Florian
Munro J. Ian
Rotenberg Eva
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH
Publication date: 20/08/2021
Field of study

KITopen

Longest Lyndon Substring After Edit

Author: Bannai Hideo
Inenaga Shunsuke
Nakashima Yuto
Takeda Masayuki
Urabe Yuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Annual Symposium on Combinatorial Pattern Matching (CPM 2018)
Publication date: 01/01/2018
Field of study

The longest Lyndon substring of a string T is the longest substring of T which is a Lyndon word. LLS(T) denotes the length of the longest Lyndon substring of a string T. In this paper, we consider computing LLS(T\u27) where T\u27 is an edited string formed from T. After O(n) time and space preprocessing, our algorithm returns LLS(T\u27) in O(log n) time for any single character edit. We also consider a version of the problem with block edits, i.e., a substring of T is replaced by a given string of length l. After O(n) time and space preprocessing, our algorithm returns LLS(T\u27) in O(l log sigma + log n) time for any block edit where sigma is the number of distinct characters in T. We can modify our algorithm so as to output all the longest Lyndon substrings of T\u27 for both problems

Dagstuhl Research Online Publication Server

Computation of the suffix array, burrows-wheeler transform and FM-index in V-order

Author: Daykin Jacqueline
Mhaskar Neerja
Smyth W. F.
Publication venue
Publication date: 01/01/2021
Field of study

V-order is a total order on strings that determines an instance of Unique Maximal Factorization Families (UMFFs), a generalization of Lyndon words. The fundamental V-comparison of strings can be done in linear time and constant space. V-order has been proposed as an alternative to lexicographic order (lexorder) in the computation of suffix arrays and in the suffix-sorting induced by the Burrows-Wheeler transform (BWT). In line with the recent interest in the connection between suffix arrays and Lyndon factorization, in this paper we obtain similar results for the V-order factorization. Indeed, we show that the results describing the connection between suffix arrays and Lyndon factorization are matched by analogous V-order processing. We also describe a methodology for efficiently computing the FM-Index in V-order, as well as V-order substring pattern matching using backward search

Aberystwyth Research Portal

Research Repository

String Inference from Longest-Common-Prefix Array

Author: Kärkkäinen Juha
Piątkowski Marcin
Puglisi Simon J.
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2017
Field of study

Peer reviewe

Dagstuhl Research Online Publication Server

Helsingin yliopiston digitaalinen arkisto

Repeat-Free Codes

Author: Elishco Ohad
Gabrys Ryan
Médard Muriel
Yaakobi Eitan
Publication venue
Publication date: 12/09/2019
Field of study

In this paper we consider the problem of encoding data into repeat-free sequences in which sequences are imposed to contain any

k

-tuple at most once (for predefined

k

). First, the capacity and redundancy of the repeat-free constraint are calculated. Then, an efficient algorithm, which uses a single bit of redundancy, is presented to encode length-

n

sequences for

k=2+2\log (n)

. This algorithm is then improved to support any value of

k

of the form

k=a\log (n)

, for

1<a

, while its redundancy is

o(n)

. We also calculate the capacity of repeat-free sequences when combined with local constraints which are given by a constrained system, and the capacity of multi-dimensional repeat-free codes.Comment: 18 page

arXiv.org e-Print Archive

DSpace@MIT

Compressibility-Aware Quantum Algorithms on Strings

Author: Gibney Daniel
Thankachan Sharma V.
Publication venue
Publication date: 14/02/2023
Field of study

Sublinear time quantum algorithms have been established for many fundamental problems on strings. This work demonstrates that new, faster quantum algorithms can be designed when the string is highly compressible. We focus on two popular and theoretically significant compression algorithms -- the Lempel-Ziv77 algorithm (LZ77) and the Run-length-encoded Burrows-Wheeler Transform (RL-BWT), and obtain the results below. We first provide a quantum algorithm running in

\tilde{O}(\sqrt{zn})

time for finding the LZ77 factorization of an input string

T[1..n]

with

z

factors. Combined with multiple existing results, this yields an

\tilde{O}(\sqrt{rn})

time quantum algorithm for finding the RL-BWT encoding with

r

BWT runs. Note that

r = \tilde{\Theta}(z)

. We complement these results with lower bounds proving that our algorithms are optimal (up to polylog factors). Next, we study the problem of compressed indexing, where we provide a

\tilde{O}(\sqrt{rn})

time quantum algorithm for constructing a recently designed

\tilde{O}(r)

space structure with equivalent capabilities as the suffix tree. This data structure is then applied to numerous problems to obtain sublinear time quantum algorithms when the input is highly compressible. For example, we show that the longest common substring of two strings of total length

n

can be computed in

\tilde{O}(\sqrt{zn})

time, where

z

is the number of factors in the LZ77 factorization of their concatenation. This beats the best known

\tilde{O}(n^\frac{2}{3})

time quantum algorithm when

z

is sufficiently small

arXiv.org e-Print Archive

On baier's sort of maximal Lyndon substrings

Author: Franěk F.
Liut M.
Smyth W.F.
Publication venue
Publication date: 01/01/2018
Field of study

We describe and analyze in terms of Lyndon words an elementary sort of maximal Lyndon factors of a string and prove formally its correctness. Since the sort is based on the first phase of Baier’s algorithm for sorting of the suffixes of a string, we refer to it as Baier’s sort

Research Repository