17 research outputs found

    About the p-paperfolding words

    Get PDF
    AbstractLet p be an integer greater than or equal to 2. The aim of this paper is to study the language associated to a p-paperfolding sequence. It is known that the number of factors of length n of a 2-paperfolding sequence (i.e. its complexity function) is P(n) = 4n for n ⩾ 7. It is also known that the language of all the factors of all 2-paperfolding sequences is not context-free and that its generating function is transcendental.We show that the complexity function of a p-paperfolding sequence is either strictly subaffine or ultimately linear. The first case never happens if p = 2 or 3. In the second case, the complexity function is either P(n) = 2n or P(n) = 4n for n large enough. We give a simple necessary and sufficient condition for the number of special factors to be p-automatic. We finally show that, for any given p, the language of all factors of all p-paperfolding sequences is not context-free, and that the associated generating series is not algebraic

    Overlap-Free Words and Generalizations

    Get PDF
    The study of combinatorics on words dates back at least to the beginning of the 20th century and the work of Axel Thue. Thue was the first to give an example of an infinite word over a three letter alphabet that contains no squares (identical adjacent blocks) xx. This result was eventually used to solve some longstanding open problems in algebra and has remarkable connections to other areas of mathematics and computer science as well. This thesis will consider several different generalizations of Thue's work. In particular we shall study the properties of infinite words avoiding various types of repetitions. In Chapter 1 we introduce the theory of combinatorics on words. We present the basic definitions and give an historical survey of the area. In Chapter 2 we consider the work of Thue in more detail. We present various well-known properties of the Thue-Morse word and give some generalizations. We examine Fife's characterization of the infinite overlap-free words and give a simpler proof of this result. We also present some applications to transcendental number theory, generalizing a classical result of Mahler. In Chapter 3 we generalize a result of Seebold by showing that the only infinite 7/3-power-free binary words that can be obtained by iterating a morphism are the Thue-Morse word and its complement. In Chapter 4 we continue our study of overlap-free and 7/3-power-free words. We discuss the squares that can appear as subwords of these words. We also show that it is possible to construct infinite 7/3-power-free binary words containing infinitely many overlaps. In Chapter 5 we consider certain questions of language theory. In particular, we examine the context-freeness of the set of words containing overlaps. We show that over a three-letter alphabet, this set is not context-free, and over a two-letter alphabet, we show that this set cannot be unambiguously context-free. In Chapter 6 we construct infinite words over a four-letter alphabet that avoid squares in any arithmetic progression of odd difference. Our constructions are based on properties of the paperfolding words. We use these infinite words to construct non-repetitive tilings of the integer lattice. In Chapter 7 we consider approximate squares rather than squares. We give constructions of infinite words that avoid such approximate squares. In Chapter 8 we conclude the work and present some open problems

    Automatic winning shifts

    Get PDF
    To each one-dimensional subshift XX, we may associate a winning shift W(X)W(X) which arises from a combinatorial game played on the language of XX. Previously it has been studied what properties of XX does W(X)W(X) inherit. For example, XX and W(X)W(X) have the same factor complexity and if XX is a sofic subshift, then W(X)W(X) is also sofic. In this paper, we develop a notion of automaticity for W(X)W(X), that is, we propose what it means that a vector representation of W(X)W(X) is accepted by a finite automaton. Let SS be an abstract numeration system such that addition with respect to SS is a rational relation. Let XX be a subshift generated by an SS-automatic word. We prove that as long as there is a bound on the number of nonzero symbols in configurations of W(X)W(X) (which follows from XX having sublinear factor complexity), then W(X)W(X) is accepted by a finite automaton, which can be effectively constructed from the description of XX. We provide an explicit automaton when XX is generated by certain automatic words such as the Thue-Morse word.Comment: 28 pages, 5 figures, 1 tabl

    Automatic Sequences and Decidable Properties: Implementation and Applications

    Get PDF
    In 1912 Axel Thue sparked the study of combinatorics on words when he showed that the Thue-Morse sequence contains no overlaps, that is, factors of the form ayaya. Since then many interesting properties of sequences began to be discovered and studied. In this thesis, we consider a class of infinite sequences generated by automata, called the k-automatic sequences. In particular, we present a logical theory in which many properties of k-automatic sequences can be expressed as predicates and we show that such predicates are decidable. Our main contribution is the implementation of a theorem prover capable of practically characterizing many commonly sought-after properties of k-automatic sequences. We showcase a panoply of results achieved using our method. We give new explicit descriptions of the recurrence and appearance functions of a list of well-known k-automatic sequences. We define a related function, called the condensation function, and give explicit descriptions for it as well. We re-affirm known results on the critical exponent of some sequences and determine it for others where it was previously unknown. On the more theoretical side, we show that the subword complexity p(n) of k-automatic sequences is k-synchronized, i.e., the language of pairs (n, p(n)) (expressed in base k) is accepted by an automaton. Furthermore, we prove that the Lyndon factorization of k-automatic sequences is also k-automatic and explicitly compute the factorization for several sequences. Finally, we show that while the number of unbordered factors of length n is not k-synchronized, it is k-regular

    Critical Exponents and Stabilizers of Infinite Words

    Get PDF
    This thesis concerns infinite words over finite alphabets. It contributes to two topics in this area: critical exponents and stabilizers. Let w be a right-infinite word defined over a finite alphabet. The critical exponent of w is the supremum of the set of exponents r such that w contains an r-power as a subword. Most of the thesis (Chapters 3 through 7) is devoted to critical exponents. Chapter 3 is a survey of previous research on critical exponents and repetitions in morphic words. In Chapter 4 we prove that every real number greater than 1 is the critical exponent of some right-infinite word over some finite alphabet. Our proof is constructive. In Chapter 5 we characterize critical exponents of pure morphic words generated by uniform binary morphisms. We also give an explicit formula to compute these critical exponents, based on a well-defined prefix of the infinite word. In Chapter 6 we generalize our results to pure morphic words generated by non-erasing morphisms over any finite alphabet. We prove that critical exponents of such words are algebraic, of a degree bounded by the alphabet size. Under certain conditions, our proof implies an algorithm for computing the critical exponent. We demonstrate our method by computing the critical exponent of some families of infinite words. In particular, in Chapter 7 we compute the critical exponent of the Arshon word of order n for n ≥ 3. The stabilizer of an infinite word w defined over a finite alphabet Σ is the set of morphisms f: Σ*→Σ* that fix w. In Chapter 8 we study various problems related to stabilizers and their generators. We show that over a binary alphabet, there exist stabilizers with at least n generators for all n. Over a ternary alphabet, the monoid of morphisms generating a given infinite word by iteration can be infinitely generated, even when the word is generated by iterating an invertible primitive morphism. Stabilizers of strict epistandard words are cyclic when non-trivial, while stabilizers of ultimately strict epistandard words are always non-trivial. For this latter family of words, we give a characterization of stabilizer elements. We conclude with a list of open problems, including a new problem that has not been addressed yet: the D0L repetition threshold

    Algorithms and Data Structures for Coding, Indexing, and Mining of Sequential Data

    Get PDF
    In recent years, the production of sequential data has been rapidly increasing. This requires solving challenging problems about how to represent information, how to retrieve information, and how to extract knowledge, from sequential data. These questions belong to the areas of coding, indexing, and mining, respectively. In this thesis, we investigate problems from those three areas. Coding refers to the way in which information is represented. Coding aims at generating optimal codes, that are codes having a minimum expected length. Codes can be generated for different purposes, from data compression to error detection/correction. The Lempel-Ziv 77 parsing produces an asymptotically optimal code in terms of compression. We study algorithms to efficiently decompress strings from the Lempel-Ziv 77 parsing, using memory proportional to the size of the parsing itself. We provide the first implementation of an algorithm by Bille et al., the only work we are aware of on this problem. We present a practical evaluation of this approach and several optimizations which improve the performance on all datasets we tested. Through the Ulam-R{'e}nyi game, it is possible to provide optimal adaptive error-correcting codes. The game consists of discovering an unknown mm-bit number by asking membership questions the answers to which can be erroneous. Questions are formulated knowing the answers to all previous ones. We want to find an optimal strategy, i.e., a strategy that can identify any mm-bit number using the theoretical minimum number of questions. We studied the case where questions are a union of up to a fixed number of intervals, and up to three answers can be erroneous. We first show that for any sufficiently large mm, there exists a strategy to identify an initially unknown mm-bit number which uses at most four intervals per question. We further refine our main tool to turn the above asymptotic result into a complete characterization of those instances of the Ulam-R{'e}nyi game that admit optimal strategies. Indexing refers to the way in which information is retrieved. An index for texts permits finding all occurrences of any substring, without traversing the whole text. Many applications require to look for approximate substrings. One of these is the problem of jumbled pattern matching, where two strings match if one is a permutation of the other. We study combinatorial aspects of prefix normal words, a class of binary words introduced in this context. These words can be used as indices for the Indexed Binary Jumbled Pattern Matching problem. We present a new recursive generation algorithm for prefix normal words that is competitive with the previous one but allows to list all prefix normal words sharing the same prefix. This sheds lights on novel insights that may help solving the problem of counting the number of prefix normal words of a given length. We then introduce infinite prefix normal words, and we show that one of the operations used by the algorithm, when repeatedly applied to extend a word, produces an infinite prefix normal word. This motivates the seeking for other operations that produce infinite prefix normal words. We found that one of these operations establishes a connection between prefix normal words and Sturmian words. We also explored the relationship between prefix normal words and Abelian complexity, as well as between prefix normal words and lexicographic order. Mining refers to the way in which information is converted into knowledge. The process of knowledge discovery covers several processing steps, including knowledge extraction. We analyze the problem of mining assertions for an embedded system from its simulation traces. This problem can be modeled as a pattern discovery problem on colored strings. We present two problems of pattern discovery on colored strings: patterns for one color only, or for all colors at the same time. We present two suffix tree-based algorithms. The first algorithm solves both the one color problem and the all colors problem. We then, introduce modifications which improve performance of the algorithm both on synthetic and on real data. We implemented and evaluated the proposed approaches, highlighting time trade-offs that can be obtained. A different way of knowledge extraction is based on the information-theoretic perspective of Pearl's model of causality. It has been postulated that the true causality direction between two phenomena A and B is related to the problem of finding the minimum entropy joint distribution between A and B. This problem is known to be NP-hard, and greedy algorithms have recently been proposed. We provide a novel analysis of one of the proposed heuristic showing that this algorithm guarantees an additive approximation of 1 bit. We then, provide a general criterion for guaranteeing an additive approximation factor of 1. This criterion may be of independent interest in other contexts where couplings are used
    corecore