Search CORE

45 research outputs found

String attractors and combinatorics on words

Author: Mantaci S.
Restivo A.
Romana G.
Rosone G.
Sciortino M.
Publication venue: CEUR-WS
Publication date: 01/01/2019
Field of study

The notion of string attractor has recently been introduced in [Prezza, 2017] and studied in [Kempa and Prezza, 2018] to provide a unifying framework for known dictionary-based compressors. A string attractor for a word w = w[1]w[2] · · · w[n] is a subset Γ of the positions 1, . . ., n, such that all distinct factors of w have an occurrence crossing at least one of the elements of Γ. While finding the smallest string attractor for a word is a NP-complete problem, it has been proved in [Kempa and Prezza, 2018] that dictionary compressors can be interpreted as algorithms approximating the smallest string attractor for a given word. In this paper we explore the notion of string attractor from a combinatorial point of view, by focusing on several families of finite words. The results presented in the paper suggest that the notion of string attractor can be used to define new tools to investigate combinatorial properties of the words

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università di Palermo

Clustering words

Author: Ferenczi Sébastien
Zamboni Luca Q.
Publication venue
Publication date: 06/04/2012
Field of study

We characterize words which cluster under the Burrows-Wheeler transform as those words

w

such that

ww

occurs in a trajectory of an interval exchange transformation, and build examples of clustering words

arXiv.org e-Print Archive

HAL-UJM

Hal-Diderot

Burrows-wheeler transform of words defined by morphisms

Author: A Restivo
A Restivo
A Restivo
B Tan
D Adjeroh
E Barcucci
G Manzini
GA Hedlund
H Kaplan
I Gessel
J Simpson
M Crochemore
M Lothaire
P Ferragina
S Mantaci
S Mantaci
S Mantaci
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Crossref

Florence Research

On the Impact of Morphisms on BWT-Runs

Author: Fici Gabriele
Romana Giuseppe
Sciortino Marinella
Urbina Cristian
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)
Publication date: 01/01/2023
Field of study

Morphisms are widely studied combinatorial objects that can be used for generating infinite families of words. In the context of Information theory, injective morphisms are called (variable length) codes. In Data compression, the morphisms, combined with parsing techniques, have been recently used to define new mechanisms to generate repetitive words. Here, we show that the repetitiveness induced by applying a morphism to a word can be captured by a compression scheme based on the Burrows-Wheeler Transform (BWT). In fact, we prove that, differently from other compression-based repetitiveness measures, the measure r_bwt (which counts the number of equal-letter runs produced by applying BWT to a word) strongly depends on the applied morphism. More in detail, we characterize the binary morphisms that preserve the value of r_bwt(w), when applied to any binary word w containing both letters. They are precisely the Sturmian morphisms, which are well-known objects in Combinatorics on words. Moreover, we prove that it is always possible to find a binary morphism that, when applied to any binary word containing both letters, increases the number of BWT-equal letter runs by a given (even) number. In addition, we derive a method for constructing arbitrarily large families of binary words on which BWT produces a given (even) number of new equal-letter runs. Such results are obtained by using a new class of morphisms that we call Thue-Morse-like. Finally, we show that there exist binary morphisms ? for which it is possible to find words w such that the difference r_bwt(?(w))-r_bwt(w) is arbitrarily large

Dagstuhl Research Online Publication Server

On the Impact of Morphisms on BWT-Runs

Author: Fici G.
Romana G.
Sciortino M.
Urbina C.
Publication venue
Publication date: 01/01/2023
Field of study

Morphisms are widely studied combinatorial objects that can be used for generating infinite families of words. In the context of Information theory, injective morphisms are called (variable length) codes. In Data compression, the morphisms, combined with parsing techniques, have been recently used to define new mechanisms to generate repetitive words. Here, we show that the repetitiveness induced by applying a morphism to a word can be captured by a compression scheme based on the Burrows-Wheeler Transform (BWT). In fact, we prove that, differently from other compression-based repetitiveness measures, the measure r_bwt (which counts the number of equal-letter runs produced by applying BWT to a word) strongly depends on the applied morphism. More in detail, we characterize the binary morphisms that preserve the value of r_bwt(w), when applied to any binary word w containing both letters. They are precisely the Sturmian morphisms, which are well-known objects in Combinatorics on words. Moreover, we prove that it is always possible to find a binary morphism that, when applied to any binary word containing both letters, increases the number of BWT-equal letter runs by a given (even) number. In addition, we derive a method for constructing arbitrarily large families of binary words on which BWT produces a given (even) number of new equal-letter runs. Such results are obtained by using a new class of morphisms that we call Thue-Morse-like. Finally, we show that there exist binary morphisms μ for which it is possible to find words w such that the difference r_bwt(μ(w))-r_bwt(w) is arbitrarily large

Archivio istituzionale della ricerca - Università di Palermo

Repetitiveness Measures based on String Attractors and Burrows-Wheeler Transform: Properties and Applications

Author: ROMANA Giuseppe
Publication venue: place:Palermo
Publication date: 06/07/2023
Field of study

Archivio istituzionale della ricerca - Università di Palermo

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

Author: A Blumer
A de Luca
A de Luca
A de Luca
A Lempel
A Luca
D Knuth
G Castiglione
G Castiglione
J Berstel
J Borel
JA Storer
JC Kieffer
M Lothaire
S Mantaci
T Gagie
T Ohno
Publication venue
Publication date: 19/08/2020
Field of study

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as

r

) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the

r

parameter as measure of repetitiveness, especially to evaluate the performance in terms of both space and time of compressed indexing data structures. In this paper, we investigate

\rho(v)

, the ratio of

r

and of the number of runs of the BWT of the reverse of

v

. Kempa and Kociumaka [FOCS 2020] gave the first non-trivial upper bound as

\rho(v) = O(\log^2(n))

, for any string

v

of length

n

. However, nothing is known about the tightness of this upper bound. We present infinite families of binary strings for which

\rho(v) = \Theta(\log n)

holds, thus giving the first non-trivial lower bound on

\rho(n)

, the maximum over all strings of length

n

. Our results suggest that

r

is not an ideal measure of the repetitiveness of the string, since the number of repeated factors is invariant between the string and its reverse. We believe that there is a more intricate relationship between the number of runs of the BWT and the string's combinatorial properties.Comment: 14 pages, 2 figue

arXiv.org e-Print Archive

Crossref

Catalogo dei prodotti della ricerca

Clustering and Arnoux-Rauzy words

Author: Ferenczi Sébastien
Zamboni Luca Q.
Publication venue
Publication date: 30/05/2023
Field of study

We characterize the clustering of a word under the Burrows-Wheeler transform in terms of the resolution of a bounded number of bispecial factors belonging to the language generated by all its powers. We use this criterion to compute, in every given Arnoux-Rauzy language on three letters, an explicit bound

K

such that each word of length at least

K

is not clustering; this bound is sharp for a set of Arnoux-Rauzy languages including the Tribonacci one. In the other direction, we characterize all standard Arnoux-Rauzy clustering words, and all perfectly clustering Arnoux-Rauzy words. We extend some results to episturmian languages, characterizing those which produce infinitely many clustering words, and to larger alphabets

arXiv.org e-Print Archive

On the Structure of Bispecial Sturmian Words

Author: Fici Gabriele
Publication venue: 'Elsevier BV'
Publication date: 19/11/2013
Field of study

A balanced word is one in which any two factors of the same length contain the same number of each letter of the alphabet up to one. Finite binary balanced words are called Sturmian words. A Sturmian word is bispecial if it can be extended to the left and to the right with both letters remaining a Sturmian word. There is a deep relation between bispecial Sturmian words and Christoffel words, that are the digital approximations of Euclidean segments in the plane. In 1997, J. Berstel and A. de Luca proved that \emph{palindromic} bispecial Sturmian words are precisely the maximal internal factors of \emph{primitive} Christoffel words. We extend this result by showing that bispecial Sturmian words are precisely the maximal internal factors of \emph{all} Christoffel words. Our characterization allows us to give an enumerative formula for bispecial Sturmian words. We also investigate the minimal forbidden words for the language of Sturmian words.Comment: arXiv admin note: substantial text overlap with arXiv:1204.167

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Palermo