15 research outputs found

    The Ehrenfeucht–Silberger problem

    Get PDF
    AbstractWe consider repetitions in words and solve a longstanding open problem about the relation between the period of a word and the length of its longest unbordered factor (where factor means uninterrupted subword). A word u is called bordered if there exists a proper prefix that is also a suffix of u, otherwise it is called unbordered. In 1979 Ehrenfeucht and Silberger raised the following problem: What is the maximum length of a word w, w.r.t. the length τ of its longest unbordered factor, such that τ is shorter than the period π of w. We show that, if w is of length 73τ or more, then τ=π which gives the optimal asymptotic bound

    On Maximal Unbordered Factors

    Get PDF
    Given a string SS of length nn, its maximal unbordered factor is the longest factor which does not have a border. In this work we investigate the relationship between nn and the length of the maximal unbordered factor of SS. We prove that for the alphabet of size σ5\sigma \ge 5 the expected length of the maximal unbordered factor of a string of length~nn is at least 0.99n0.99 n (for sufficiently large values of nn). As an application of this result, we propose a new algorithm for computing the maximal unbordered factor of a string.Comment: Accepted to the 26th Annual Symposium on Combinatorial Pattern Matching (CPM 2015

    Computing the Longest Unbordered Substring

    Get PDF
    International audienceA substring of a string is unbordered if its only border is the empty string. The study of unbordered substrings goes back to the paper of Ehrenfeucht and Silberger [7]. The main focus of [7] and of subsequent papers was to elucidate the relationship between the longest unbordered substring and the minimal period of strings. In this paper, we consider the algorithmic problem of computing the longest unbordered substring of a string. The problem was introduced recently in [12], where the authors showed that the average-case running time of the simple, border-array based algorithm can be bounded by O(n 2 /σ 4) for σ being the size of the alphabet. (The worst-case running time remained O(n 2).) Here we propose two algorithms, both presenting substantial theoretical improvements to the result of [12]. The first algorithm has O(n log n) average-case running time and O(n 2) worst-case running time, and the second algorithm has O(n 1.5) worst-case running time

    Longest Unbordered Factor in Quasilinear Time

    Get PDF
    A border u of a word w is a proper factor of w occurring both as a prefix and as a suffix. The maximal unbordered factor of w is the longest factor of w which does not have a border. Here an O(n log n)-time with high probability (or O(n log n log^2 log n)-time deterministic) algorithm to compute the Longest Unbordered Factor Array of w for general alphabets is presented, where n is the length of w. This array specifies the length of the maximal unbordered factor starting at each position of w. This is a major improvement on the running time of the currently best worst-case algorithm working in O(n^{1.5}) time for integer alphabets [Gawrychowski et al., 2015]

    Sanojen jaksot ja reunattomat tekijät. Ehrenfeuchtin—Silbergerin ongelma

    Get PDF
    Ehrenfeuchtin–Silbergerin ongelma kysyy, kuinka pitkä sana voi olla sen pisimpien reunattomien tekijöiden pituuden suhteen, ennen kuin sillä on sen lyhimmän jakson pituinen reunaton tekijä. Ongelman ratkaisu on sanan pisimpien reunattomien tekijöiden pituudesta riippuva raja, jolle kaikilla ainakin tämän pituisilla sanoilla on välttämättä lyhimmän jakson pituinen reunaton tekijä. Tutkielmassa esitellään tämän ongelman paras tunnettu ratkaisu. Lisäksi tarkastellaan muita ongelmaan läheisesti liittyviä tuloksia. Päälauseena todistetaan paras tunnettu raja pisimpien reunattomien tekijöiden suhteen. Todistus on peräisin Štěpán Holubin ja Dirk Nowotkan artikkelista The Ehrenfeucht–Silberger problem (Journal of Combinatorial Theory, Series A) sekä tämän artikkelin alustavasta versiosta ICALP 2009 -konferenssin proceedings-julkaisussa. Esitetty ratkaisu näytetään vakiotermiä vaille optimaaliseksi vertaamalla sitä parhaaseen tunnettuun esimerkkiin äärettömästä sanajoukosta, jonka jokaisen sanan pisimmät reunattomat tekijät ovat lyhyempiä kuin lyhin jakso ja jonka jokaisen sanan pituus pisimpien reunattomien tekijöiden pituuden suhteen on suurin tunnettu. Johdatteluna esitellään perustuloksia sanan jaksoista ja reunattomista tekijöistä sekä esitellään eräitä muita ehtoja sille, milloin sanalla on sen lyhimmän jakson pituinen reunaton tekijä. Toisaalta tarkastellaan myös ongelmaa, joka kysyy vastaavaa rajaa lyhimmän jakson suhteen. Uutena tuloksena parannetaan parasta aiemmin tunnettua rajaa yhtä pienemmäksi, jolloin saatu raja on optimaalinen. Lisäksi todistetaan, mitä muotoa ovat kaikki sanat, joilla ei ole lyhimmän jaksonsa pituista reunatonta tekijää ja jotka ovat lyhimmän jaksonsa suhteen mahdollisimman pitkiä. Lisäksi tarkastellaan kriittistä tekijöihinjakoa, joka liittää sanan lyhimmän jakson sen paikallisiin jaksoihin. Kriittisen tekijöihinjaon lauseesta esitetään eräs todistus. Tämän lisäksi todistetaan päälauseen todistuksessa tarvittava lemma, joka liittyy sanan konjugaatin tekijöihinjaon kriittisyyteen.Siirretty Doriast

    On the Properties and Structure of Bordered Words and Generalizations

    Get PDF
    Combinatorics on words is a field of mathematics and theoretical computer science that is concerned with sequences of symbols called words, or strings. One class of words that are ubiquitous in combinatorics on words, and theoretical computer science more broadly, are the bordered words. The word w has a border u if u is a non-empty proper prefix and suffix of w. The word w is said to be bordered if it has a border. Otherwise w is said to be unbordered. This thesis is primarily concerned with variations and generalizations of bordered and unbordered words. In Chapter 1 we introduce the field of combinatorics on words and give a brief overview of the literature on borders relevant to this thesis. In Chapter 2 we give necessary definitions, and we present a more in-depth literature review on results on borders relevant to this thesis. In Chapter 3 we complete the characterization due to Harju and Nowotka of binary words with the maximum number of unbordered conjugates. We also show that for every number, up to this maximum, there exists a binary word with that number of unbordered conjugates. In Chapter 4 we give results on pairs of words that almost commute and anti-commute. Two words x and y almost commute if xy and yx differ in exactly two places, and they anti-commute if xy and yx differ in all places. We characterize and count the number of pairs of words that almost and anti-commute. We also characterize and count variations of almost-commuting words. Finally we conclude with some asymptotic results related to the number of almost-commuting pairs of words. In Chapter 5 we count the number of length-n bordered words with a unique border. We also show that the probability that a length-n word has a unique border tends to a constant. In Chapter 6 we present results on factorizations of words related to borders, called block palindromes. A block palindrome is a factorization of a word into blocks that turns into a palindrome if each identical block is replaced by a distinct character. Each block is a border of a central block. We call the number of blocks in a block palindrome the width of the block palindrome. The largest block palindrome of a word is the block palindrome of the word with the maximum width. We count all length-n words that have a width-t largest block palindrome. We also show that the expected width of a largest block palindrome tends to a constant. Finally we conclude with some results on another extremal variation of block palindromes, the smallest block palindrome. In Chapter 7 we present the main results of the thesis. Roughly speaking, a word is said to be closed if it contains a non-empty proper border that occurs exactly twice in the word. A word is said to be privileged if it is of length ≤ 1 or if it contains a non-empty proper privileged border that occurs exactly twice in the word. We give new and improved bounds on the number of length-n closed and privileged words over a k-letter alphabet. In Chapter 8 we work with a generalization of bordered words to pairs of words. The main result of this chapter is a characterization and enumeration result for this generalization of bordered words to multiple dimensions. In Chapter 9 we conclude by summarizing the results of this thesis and presenting avenues for future research

    Automatic Sequences and Decidable Properties: Implementation and Applications

    Get PDF
    In 1912 Axel Thue sparked the study of combinatorics on words when he showed that the Thue-Morse sequence contains no overlaps, that is, factors of the form ayaya. Since then many interesting properties of sequences began to be discovered and studied. In this thesis, we consider a class of infinite sequences generated by automata, called the k-automatic sequences. In particular, we present a logical theory in which many properties of k-automatic sequences can be expressed as predicates and we show that such predicates are decidable. Our main contribution is the implementation of a theorem prover capable of practically characterizing many commonly sought-after properties of k-automatic sequences. We showcase a panoply of results achieved using our method. We give new explicit descriptions of the recurrence and appearance functions of a list of well-known k-automatic sequences. We define a related function, called the condensation function, and give explicit descriptions for it as well. We re-affirm known results on the critical exponent of some sequences and determine it for others where it was previously unknown. On the more theoretical side, we show that the subword complexity p(n) of k-automatic sequences is k-synchronized, i.e., the language of pairs (n, p(n)) (expressed in base k) is accepted by an automaton. Furthermore, we prove that the Lyndon factorization of k-automatic sequences is also k-automatic and explicitly compute the factorization for several sequences. Finally, we show that while the number of unbordered factors of length n is not k-synchronized, it is k-regular
    corecore