956 research outputs found

    Computing maximal-exponent factors in an overlap-free word

    Get PDF
    The exponent of a string is the quotient of its length over its smallest period. The exponent and the period of a string can be computed in time proportional to the string length. We design an algorithm to compute the maximal exponent of all factors of an overlap-free string. Our algorithm runs in linear time on a fixed-size alphabet, while a naive solution of the question would run in cubic time. The solution for non overlap-free strings derives from algorithms to compute all maximal repetitions, also called runs, occurring in the string. We also show there is a linear number of occurrences of maximal-exponent factors in an overlap-free string. Their maximal number lies between 0.66n and 2.25n in a string of length n. The algorithm can additionally locate all of them in linear time

    Lempel-Ziv Parsing in External Memory

    Full text link
    For decades, computing the LZ factorization (or LZ77 parsing) of a string has been a requisite and computationally intensive step in many diverse applications, including text indexing and data compression. Many algorithms for LZ77 parsing have been discovered over the years; however, despite the increasing need to apply LZ77 to massive data sets, no algorithm to date scales to inputs that exceed the size of internal memory. In this paper we describe the first algorithm for computing the LZ77 parsing in external memory. Our algorithm is fast in practice and will allow the next generation of text indexes to be realised for massive strings and string collections.Comment: 10 page

    Counting Maximal-Exponent Factors in Words

    Get PDF
    This article shows tight upper and lower bounds on the number of occurrences of maximal-exponent factors occurring in a word

    Efficient Computation of Maximal Anti-Exponent in Palindrome-Free Strings

    Get PDF
    A palindrome is a string x = a1 · · · an which is equal to its reversal x = an · · · a1. We consider gapped palindromes which are strings of the form uvu , where u, v are strings, |v| ≥ 2, and u is the reversal of u. Replicating the standard notion of string exponent, we define the anti- exponent of a gapped palindrome uvu as the quotient of |uvu | by |uv|. To get an efficient computation of maximal anti-exponent of factors in a palindrome-free string, we apply techniques based on the suffix au- tomaton and the reversed Lempel-Ziv factorisation. Our algorithm runs in O(n) time on a fixed-size alphabet or O(n log σ) on a large alphabet, which dramatically outperforms the naive cubic-time solution

    Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets

    Get PDF
    The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges. We present the first O(n)-time algorithm for computing the DAWG of a given string y of length n over an integer alphabet of polynomial size in n. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. As an application to our O(n)-time DAWG construction algorithm, we show that the set MAW(y) of all minimal absent words of y can be computed in optimal O(n + |MAW(y)|) time and O(n) working space for integer alphabets

    Repetitive subwords

    Get PDF
    The central notionof thisthesisis repetitionsin words. We studyproblemsrelated to contiguous repetitions. More specifically we will consider repeating scattered subwords of non-primitive words, i.e. words which are complete repetitions of other words. We will present inequalities concerning these occurrences as well as giving apartial solutionto an openproblemposedby Salomaaet al. We will characterize languages, whichare closed under the operation ofduplication, thatis repeating any factor of a word. We alsogive newbounds onthe number of occurrencesof certain types of repetitions of words. We give a solution to an open problem posed by Calbrix and Nivat concerning regular languages consisting of non-primitive words. We alsopresentsomeresultsregarding theduplication closureoflanguages,among which a new proof to a problem of Bovet and Varricchio
    corecore