40 research outputs found

    Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

    Full text link
    The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as rr) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the rr parameter as measure of repetitiveness, especially to evaluate the performance in terms of both space and time of compressed indexing data structures. In this paper, we investigate ρ(v)\rho(v), the ratio of rr and of the number of runs of the BWT of the reverse of vv. Kempa and Kociumaka [FOCS 2020] gave the first non-trivial upper bound as ρ(v)=O(log2(n))\rho(v) = O(\log^2(n)), for any string vv of length nn. However, nothing is known about the tightness of this upper bound. We present infinite families of binary strings for which ρ(v)=Θ(logn)\rho(v) = \Theta(\log n) holds, thus giving the first non-trivial lower bound on ρ(n)\rho(n), the maximum over all strings of length nn. Our results suggest that rr is not an ideal measure of the repetitiveness of the string, since the number of repeated factors is invariant between the string and its reverse. We believe that there is a more intricate relationship between the number of runs of the BWT and the string's combinatorial properties.Comment: 14 pages, 2 figue

    Cyclic Complexity of Words

    Get PDF
    We introduce and study a complexity function on words cx(n),c_x(n), called \emph{cyclic complexity}, which counts the number of conjugacy classes of factors of length nn of an infinite word x.x. We extend the well-known Morse-Hedlund theorem to the setting of cyclic complexity by showing that a word is ultimately periodic if and only if it has bounded cyclic complexity. Unlike most complexity functions, cyclic complexity distinguishes between Sturmian words of different slopes. We prove that if xx is a Sturmian word and yy is a word having the same cyclic complexity of x,x, then up to renaming letters, xx and yy have the same set of factors. In particular, yy is also Sturmian of slope equal to that of x.x. Since cx(n)=1c_x(n)=1 for some n1n\geq 1 implies xx is periodic, it is natural to consider the quantity lim infncx(n).\liminf_{n\rightarrow \infty} c_x(n). We show that if xx is a Sturmian word, then lim infncx(n)=2.\liminf_{n\rightarrow \infty} c_x(n)=2. We prove however that this is not a characterization of Sturmian words by exhibiting a restricted class of Toeplitz words, including the period-doubling word, which also verify this same condition on the limit infimum. In contrast we show that, for the Thue-Morse word tt, lim infnct(n)=+.\liminf_{n\rightarrow \infty} c_t(n)=+\infty.Comment: To appear in Journal of Combinatorial Theory, Series

    An Introductory Course on Constraint Logic Programming

    Get PDF
    The purpose of this document is to serve as the printed material for the seminar "An Introductory Course on Constraint Logic Programming". The intended audience of this seminar are industrial programmers with a degree in Computer Science but little previous experience with constraint programming. The seminar itself has been field tested, prior to the writing of this document, with a group of the application programmers of Esprit project P23182, "VOCAL", aimed at developing an application in scheduling of field maintenance tasks in the context of an electric utility company. The contents of this paper follow essentially the flow of the seminar slides. However, there are some differences. These differences stem from our perception from the experience of teaching the seminar, that the technical aspects are the ones which need more attention and clearer explanations in the written version. Thus, this document includes more examples than those in the slides, more exercises (and the solutions to them), as well as four additional programming projects, with which we hope the reader will obtain a clearer view of the process of development and tuning of programs using CLP. On the other hand, several parts of the seminar have been taken out: those related with the account of fields and applications in which C(L)P is useful, and the enumerations of C(L)P tools available. We feel that the slides are clear enough, and that for more information on available tools, the interested reader will find more up-to-date information by browsing the Web or asking the vendors directly. More details in this direction will actually boil down to summarizing a user manual, which is not the aim of this document

    Acta Universitatis Sapientiae - Informatica 2011

    Get PDF

    Fundamentals of Java Programming

    Get PDF
    This book was born from the desire of having an introductory Java programming textbook whose contents can be covered in one semester. The book was written with two types of audience in mind: those who intend to major in computer science and those who want to get a glimpse of computer programming. The book does not cover graphical user interfaces or the materials that are taught in a data structure course. The book very quickly surveys the Java Collection Framework and the generics in the penultimate chapter. The book also covers the concepts of online and recursive algorithms in the last chapter. The instructors who choose to use this textbook are free to skip these chapters if there is no sufficient time. Except for the code examples that receive parameters from the command line, the code examples can be compiled and run in a command-line environment as well as in IDEs. To execute those code examples in an IDE, the user must follow the step of provide args before execution. The code examples appearing in the book have very few comments, since the actions of the code are explained in the prose. The code examples with extensive comments are available for the publisher. There are PDF lecture slides accompanying the book. They are prepared using the Beamer environment of LATEX. The source codes of the lecture slides may be available through the publisher

    Theoretical and Practical Aspects Related to the Avoidability of Patterns in Words

    Get PDF
    This thesis concerns repetitive structures in words. More precisely, it contributes to studying appearance and absence of such repetitions in words. In the first and major part of this thesis, we study avoidability of unary patterns with permutations. The second part of this thesis deals with modeling and solving several avoidability problems as constraint satisfaction problems, using the framework of MiniZinc. Solving avoidability problems like the one mentioned in the past paragraph required, the construction, via a computer program, of a very long word that does not contain any word that matches a given pattern. This gave us the idea of using SAT solvers. Representing the problem-based SAT solvers seemed to be a standardised, and usually very optimised approach to formulate and solve the well-known avoidability problems like avoidability of formulas with reversal and avoidability of patterns in the abelian sense too. The final part is concerned with a variation on a classical avoidance problem from combinatorics on words. Considering the concatenation of i different factors of the word w, pexp_i(w) is the supremum of powers that can be constructed by concatenation of such factors, and RTi(k) is then the infimum of pexp_i(w). Again, by checking infinite ternary words that satisfy some properties, we calculate the value RT_i(3) for even and odd values of i
    corecore