33 research outputs found

    Lyndon Factorization of Grammar Compressed Texts Revisited

    Get PDF
    We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n, N) + Q(n, N)n log log N) time and O(n log N + S(n, N)) space where P(n, N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS \u2717), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms \u2783) when w is highly compressible

    Optimal construction of compressed indexes for highly repetitive texts

    Get PDF
    We propose algorithms that, given the input string of length n over integer alphabet of size σ, construct the Burrows–Wheeler transform (BWT), the permuted longest-common-prefix (PLCP) array, and the LZ77 parsing in O(n/ logσ n + r polylog n) time and working space, where r is the number of runs in the BWT of the input. These are the essential components of many compressed indexes such as compressed suffix tree, FM-index, and grammar and LZ77-based indexes, but also find numerous applications in sequence analysis and data compression. The value of r is a common measure of repetitiveness that is significantly smaller than n if the string is highly repetitive. Since just accessing every symbol of the string requires Ω(n/ logσ n) time, the presented algorithms are time and space optimal for inputs satisfying the assumption n/r ∈ Ω(polylog n) on the repetitiveness. For such inputs our result improves upon the currently fastest general algorithms of Belazzougui (STOC 2014) and Munro et al. (SODA 2017) which run in O(n) time and use O(n/ logσ n) working space. We also show how to use our techniques to obtain optimal solutions on highly repetitive data for other fundamental string processing problems such as: Lyndon factorization, construction of run-length compressed suffix arrays, and some classical “textbook” problems such as computing the longest substring occurring at least some fixed number of times. Copyright © 2019 by SIAMPeer reviewe

    Finding an Optimal Alphabet Ordering for Lyndon Factorization Is Hard

    Get PDF

    Efficient string algorithmics across alphabet realms

    Get PDF
    Stringology is a subfield of computer science dedicated to analyzing and processing sequences of symbols. It plays a crucial role in various applications, including lossless compression, information retrieval, natural language processing, and bioinformatics. Recent algorithms often assume that the strings to be processed are over polynomial integer alphabet, i.e., each symbol is an integer that is at most polynomial in the lengths of the strings. In contrast to that, the earlier days of stringology were shaped by the weaker comparison model, in which strings can only be accessed by mere equality comparisons of symbols, or (if the symbols are totally ordered) order comparisons of symbols. Nowadays, these flavors of the comparison model are respectively referred to as general unordered alphabet and general ordered alphabet. In this dissertation, we dive into the realm of both integer alphabets and general alphabets. We present new algorithms and lower bounds for classic problems, including Lempel-Ziv compression, computing the Lyndon array, and the detection of squares and runs. Our results show that, instead of only assuming the standard model of computation, it is important to also consider both weaker and stronger models. Particularly, we should not discard the older and weaker comparison-based models too quickly, as they are not only powerful theoretical tools, but also lead to fast and elegant practical solutions, even by today's standards

    Algorithms and Lower Bounds for Ordering Problems on Strings

    Get PDF
    This dissertation presents novel algorithms and conditional lower bounds for a collection of string and text-compression-related problems. These results are unified under the theme of ordering constraint satisfaction. Utilizing the connections to ordering constraint satisfaction, we provide hardness results and algorithms for the following: recognizing a type of labeled graph amenable to text-indexing known as Wheeler graphs, minimizing the number of maximal unary substrings occurring in the Burrows-Wheeler Transformation of a text, minimizing the number of factors occurring in the Lyndon factorization of a text, and finding an optimal reference string for relative Lempel-Ziv encoding

    Grammatical gender and linguistic complexity : Volume II: World-wide comparative studies

    Get PDF
    Peer reviewe

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    Bridging Formal and Conceptual Semantics

    Get PDF
    The articles in this volume are the outcome of the successful BRIDGE Workshop held in DĂŒsseldorf in 2014. The workshop gathered a number of distinguished researchers from formal semantics and conceptual semantics and aimed to initiate a deeper conversation and collaboration instead of separating the two sides as competing views. The workshop provided a platform to further discuss parallelisms on specific semantic issues on the one hand and on the other hand to confront opposed claims from the two different perspectives. This volume represents a selected number of high-quality papers presented at the workshop featuring various approaches to meaning from linguistics, logic and philosophy of language. This series explores issues of mental representation, linguistic structure and representation, and their interplay. The research presented in this series is grounded in the idea explored in the Collaborative Research Center ‘The structure of representations in language, cognition and science’ (SFB 991) that there is a universal format for the representation of linguistic and cognitive concepts

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum
    corecore