33 research outputs found

    On the Properties and Structure of Bordered Words and Generalizations

    Get PDF
    Combinatorics on words is a field of mathematics and theoretical computer science that is concerned with sequences of symbols called words, or strings. One class of words that are ubiquitous in combinatorics on words, and theoretical computer science more broadly, are the bordered words. The word w has a border u if u is a non-empty proper prefix and suffix of w. The word w is said to be bordered if it has a border. Otherwise w is said to be unbordered. This thesis is primarily concerned with variations and generalizations of bordered and unbordered words. In Chapter 1 we introduce the field of combinatorics on words and give a brief overview of the literature on borders relevant to this thesis. In Chapter 2 we give necessary definitions, and we present a more in-depth literature review on results on borders relevant to this thesis. In Chapter 3 we complete the characterization due to Harju and Nowotka of binary words with the maximum number of unbordered conjugates. We also show that for every number, up to this maximum, there exists a binary word with that number of unbordered conjugates. In Chapter 4 we give results on pairs of words that almost commute and anti-commute. Two words x and y almost commute if xy and yx differ in exactly two places, and they anti-commute if xy and yx differ in all places. We characterize and count the number of pairs of words that almost and anti-commute. We also characterize and count variations of almost-commuting words. Finally we conclude with some asymptotic results related to the number of almost-commuting pairs of words. In Chapter 5 we count the number of length-n bordered words with a unique border. We also show that the probability that a length-n word has a unique border tends to a constant. In Chapter 6 we present results on factorizations of words related to borders, called block palindromes. A block palindrome is a factorization of a word into blocks that turns into a palindrome if each identical block is replaced by a distinct character. Each block is a border of a central block. We call the number of blocks in a block palindrome the width of the block palindrome. The largest block palindrome of a word is the block palindrome of the word with the maximum width. We count all length-n words that have a width-t largest block palindrome. We also show that the expected width of a largest block palindrome tends to a constant. Finally we conclude with some results on another extremal variation of block palindromes, the smallest block palindrome. In Chapter 7 we present the main results of the thesis. Roughly speaking, a word is said to be closed if it contains a non-empty proper border that occurs exactly twice in the word. A word is said to be privileged if it is of length ≤ 1 or if it contains a non-empty proper privileged border that occurs exactly twice in the word. We give new and improved bounds on the number of length-n closed and privileged words over a k-letter alphabet. In Chapter 8 we work with a generalization of bordered words to pairs of words. The main result of this chapter is a characterization and enumeration result for this generalization of bordered words to multiple dimensions. In Chapter 9 we conclude by summarizing the results of this thesis and presenting avenues for future research

    29th International Symposium on Algorithms and Computation: ISAAC 2018, December 16-19, 2018, Jiaoxi, Yilan, Taiwan

    Get PDF

    Longest Unbordered Factor in Quasilinear Time

    Get PDF
    A border u of a word w is a proper factor of w occurring both as a prefix and as a suffix. The maximal unbordered factor of w is the longest factor of w which does not have a border. Here an O(n log n)-time with high probability (or O(n log n log^2 log n)-time deterministic) algorithm to compute the Longest Unbordered Factor Array of w for general alphabets is presented, where n is the length of w. This array specifies the length of the maximal unbordered factor starting at each position of w. This is a major improvement on the running time of the currently best worst-case algorithm working in O(n^{1.5}) time for integer alphabets [Gawrychowski et al., 2015]

    Counting, Adding, and Regular Languages

    Get PDF
    In this thesis we consider two mostly disjoint topics in formal language theory that both involve the study and use of regular languages. The first topic lies in the intersection of automata theory and additive number theory. We introduce a method of producing results in additive number theory, relying on theorem-proving software and an approximation technique. As an example of the method, we prove that every natural number greater than 25 can be written as the sum of at most 3 natural numbers whose canonical base-2 representations have an equal number of 0's and 1's. We prove analogous results about similarly defined sets using the automata theory approach, but also give proofs using more "traditional" approaches. The second topic is the study languages defined by criteria involving the number of occurrences of a particular pair of words within other words. That is, we consider languages of words z defined with respect to words x, y where z has the same number of occurrences (resp., fewer occurrences), (resp., fewer occurrences or the same number of occurrences) of x as a subword of z and y as a subword of z. We give a necessary and sufficient condition on when such languages are regular, and show how to check this condition efficiently. We conclude by briefly considering ideas tying the two topics together

    Properties of Two-Dimensional Words

    Get PDF
    Combinatorics on words in one dimension is a well-studied subfield of theoretical computer science with its origins in the early 20th century. However, the closely-related study of two-dimensional words is not as popular, even though many results seem naturally extendable from the one-dimensional case. This thesis investigates various properties of these two-dimensional words. In the early 1960s, Roger Lyndon and Marcel-Paul Schutzenberger developed two famous results on conditions where nontrivial prefixes and suffixes of a one-dimensional word are identical and on conditions where two one-dimensional words commute. Here, the theorems of Lyndon and Schutzenberger are extended in the one-dimensional case to include a number of additional equivalent conditions. One such condition is shown to be equivalent to the defect theorem from formal languages and coding theory. The same theorems of Lyndon and Schutzenberger are then generalized to the two-dimensional case. The study of two-dimensional words continues by considering primitivity and periodicity in two dimensions, where a method is developed to enumerate two-dimensional primitive words. An efficient computer algorithm is presented to assist with checking the property of primitivity in two dimensions. Finally, borders in both one and two dimensions are considered, with some results being proved and others being offered as suggestions for future work. Another efficient algorithm is presented to assist with checking whether a two-dimensional word is bordered. The thesis concludes with a selection of open problems and an appendix containing extensive data related to one such open problem

    Periods and Borders of Random Words

    Get PDF
    We investigate the behavior of the periods and border lengths of random words over a fixed alphabet. We show that the asymptotic probability that a random word has a given maximal border length k is a constant, depending only on k and the alphabet size l. We give a recurrence that allows us to determine these constants with any required precision. This also allows us to evaluate the expected period of a random word. For the binary case, the expected period is asymptotically about n-1.641. We also give explicit formulas for the probability that a random word is unbordered or has maximum border length one
    corecore