Search CORE

40 research outputs found

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

Author: A Blumer
A de Luca
A de Luca
A de Luca
A Lempel
A Luca
D Knuth
G Castiglione
G Castiglione
J Berstel
J Borel
JA Storer
JC Kieffer
M Lothaire
S Mantaci
T Gagie
T Ohno
Publication venue
Publication date: 19/08/2020
Field of study

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as

r

) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the

r

parameter as measure of repetitiveness, especially to evaluate the performance in terms of both space and time of compressed indexing data structures. In this paper, we investigate

\rho(v)

, the ratio of

r

and of the number of runs of the BWT of the reverse of

v

. Kempa and Kociumaka [FOCS 2020] gave the first non-trivial upper bound as

\rho(v) = O(\log^2(n))

, for any string

v

of length

n

. However, nothing is known about the tightness of this upper bound. We present infinite families of binary strings for which

\rho(v) = \Theta(\log n)

holds, thus giving the first non-trivial lower bound on

\rho(n)

, the maximum over all strings of length

n

. Our results suggest that

r

is not an ideal measure of the repetitiveness of the string, since the number of repeated factors is invariant between the string and its reverse. We believe that there is a more intricate relationship between the number of runs of the BWT and the string's combinatorial properties.Comment: 14 pages, 2 figue

arXiv.org e-Print Archive

Crossref

Catalogo dei prodotti della ricerca

Cyclic Complexity of Words

Author: Cassaigne Julien
Fici Gabriele
Sciortino Marinella
Zamboni Luca Q.
Publication venue
Publication date: 28/06/2016
Field of study

We introduce and study a complexity function on words

c_x(n),

called \emph{cyclic complexity}, which counts the number of conjugacy classes of factors of length

n

of an infinite word

x.

We extend the well-known Morse-Hedlund theorem to the setting of cyclic complexity by showing that a word is ultimately periodic if and only if it has bounded cyclic complexity. Unlike most complexity functions, cyclic complexity distinguishes between Sturmian words of different slopes. We prove that if

x

is a Sturmian word and

y

is a word having the same cyclic complexity of

x,

then up to renaming letters,

x

and

y

have the same set of factors. In particular,

y

is also Sturmian of slope equal to that of

x.

Since

c_x(n)=1

for some

n\geq 1

implies

x

is periodic, it is natural to consider the quantity

\liminf_{n\rightarrow \infty} c_x(n).

We show that if

x

is a Sturmian word, then

\liminf_{n\rightarrow \infty} c_x(n)=2.

We prove however that this is not a characterization of Sturmian words by exhibiting a restricted class of Toeplitz words, including the period-doubling word, which also verify this same condition on the limit infimum. In contrast we show that, for the Thue-Morse word

t

\liminf_{n\rightarrow \infty} c_t(n)=+\infty.

Comment: To appear in Journal of Combinatorial Theory, Series

arXiv.org e-Print Archive

HAL-UJM

HAL AMU

Hal-Diderot

HAL-Ecole des Ponts ParisTech

Archivio istituzionale della ricerca - Università di Palermo

HAL - UPEC / UPEM

Recommended from our members

Around the Fibonacci Numeration System

Author: Edson Marcia Ruth
Publication venue: 'University of North Texas Libraries'
Publication date: 01/05/2007
Field of study

Let 1, 2, 3, 5, 8, … denote the Fibonacci sequence beginning with 1 and 2, and then setting each subsequent number to the sum of the two previous ones. Every positive integer n can be expressed as a sum of distinct Fibonacci numbers in one or more ways. Setting R(n) to be the number of ways n can be written as a sum of distinct Fibonacci numbers, we exhibit certain regularity properties of R(n), one of which is connected to the Euler φ-function. In addition, using a theorem of Fine and Wilf, we give a formula for R(n) in terms of binomial coefficients modulo two

UNT Digital Library

An Introductory Course on Constraint Logic Programming

Author: Bueno Carrillo Francisco
Cabeza Gras Daniel
Carro Liñares Manuel
García de la Banda M.
Hermenegildo Manuel V.
López García Pedro
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/1998
Field of study

The purpose of this document is to serve as the printed material for the seminar "An Introductory Course on Constraint Logic Programming". The intended audience of this seminar are industrial programmers with a degree in Computer Science but little previous experience with constraint programming. The seminar itself has been field tested, prior to the writing of this document, with a group of the application programmers of Esprit project P23182, "VOCAL", aimed at developing an application in scheduling of field maintenance tasks in the context of an electric utility company. The contents of this paper follow essentially the flow of the seminar slides. However, there are some differences. These differences stem from our perception from the experience of teaching the seminar, that the technical aspects are the ones which need more attention and clearer explanations in the written version. Thus, this document includes more examples than those in the slides, more exercises (and the solutions to them), as well as four additional programming projects, with which we hope the reader will obtain a clearer view of the process of development and tuning of programs using CLP. On the other hand, several parts of the seminar have been taken out: those related with the account of fields and applications in which C(L)P is useful, and the enumerations of C(L)P tools available. We feel that the slides are clear enough, and that for more information on available tools, the interested reader will find more up-to-date information by browsing the Web or asking the vendors directly. More details in this direction will actually boil down to summarizing a user manual, which is not the aim of this document

Archivo Digital UPM

Acta Universitatis Sapientiae - Informatica 2011

Author
Publication venue: Sapientia Hungarian University of Transylvania
Publication date: 01/01/2011
Field of study

REAL-J

Fundamentals of Java Programming

Author: Mitsunori Ogihara
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 24/04/2020
Field of study

This book was born from the desire of having an introductory Java programming textbook whose contents can be covered in one semester. The book was written with two types of audience in mind: those who intend to major in computer science and those who want to get a glimpse of computer programming. The book does not cover graphical user interfaces or the materials that are taught in a data structure course. The book very quickly surveys the Java Collection Framework and the generics in the penultimate chapter. The book also covers the concepts of online and recursive algorithms in the last chapter. The instructors who choose to use this textbook are free to skip these chapters if there is no sufficient time. Except for the code examples that receive parameters from the command line, the code examples can be compiled and run in a command-line environment as well as in IDEs. To execute those code examples in an IDE, the user must follow the step of provide args before execution. The code examples appearing in the book have very few comments, since the actions of the code are explained in the prose. The code examples with extensive comments are available for the publisher. There are PDF lecture slides accompanying the book. They are prepared using the Beamer environment of LATEX. The source codes of the lecture slides may be available through the publisher

Open Library

Theoretical and Practical Aspects Related to the Avoidability of Patterns in Words

Author: Reshadi Kamellia
Publication venue: Universitatsbibliothek Kiel
Publication date: 01/01/2019
Field of study

This thesis concerns repetitive structures in words. More precisely, it contributes to studying appearance and absence of such repetitions in words. In the first and major part of this thesis, we study avoidability of unary patterns with permutations. The second part of this thesis deals with modeling and solving several avoidability problems as constraint satisfaction problems, using the framework of MiniZinc. Solving avoidability problems like the one mentioned in the past paragraph required, the construction, via a computer program, of a very long word that does not contain any word that matches a given pattern. This gave us the idea of using SAT solvers. Representing the problem-based SAT solvers seemed to be a standardised, and usually very optimised approach to formulate and solve the well-known avoidability problems like avoidability of formulas with reversal and avoidability of patterns in the abelian sense too. The final part is concerned with a variation on a classical avoidance problem from combinatorics on words. Considering the concatenation of i different factors of the word w, pexp_i(w) is the supremum of powers that can be constructed by concatenation of such factors, and RTi(k) is then the infimum of pexp_i(w). Again, by checking infinite ternary words that satisfy some properties, we calculate the value RT_i(3) for even and odd values of i

MACAU: Open Access Repository of Kiel University