33 research outputs found
Lyndon Factorization of Grammar Compressed Texts Revisited
We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n, N) + Q(n, N)n log log N) time and O(n log N + S(n, N)) space where P(n, N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS \u2717), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms \u2783) when w is highly compressible
Optimal construction of compressed indexes for highly repetitive texts
We propose algorithms that, given the input string of length n over integer alphabet of size Ï, construct the BurrowsâWheeler transform (BWT), the permuted longest-common-prefix (PLCP) array, and the LZ77 parsing in O(n/ logÏ n + r polylog n) time and working space, where r is the number of runs in the BWT of the input. These are the essential components of many compressed indexes such as compressed suffix tree, FM-index, and grammar and LZ77-based indexes, but also find numerous applications in sequence analysis and data compression. The value of r is a common measure of repetitiveness that is significantly smaller than n if the string is highly repetitive. Since just accessing every symbol of the string requires Ω(n/ logÏ n) time, the presented algorithms are time and space optimal for inputs satisfying the assumption n/r â Ω(polylog n) on the repetitiveness. For such inputs our result improves upon the currently fastest general algorithms of Belazzougui (STOC 2014) and Munro et al. (SODA 2017) which run in O(n) time and use O(n/ logÏ n) working space. We also show how to use our techniques to obtain optimal solutions on highly repetitive data for other fundamental string processing problems such as: Lyndon factorization, construction of run-length compressed suffix arrays, and some classical âtextbookâ problems such as computing the longest substring occurring at least some fixed number of times. Copyright © 2019 by SIAMPeer reviewe
Efficient string algorithmics across alphabet realms
Stringology is a subfield of computer science dedicated to analyzing and processing sequences of symbols. It plays a crucial role in various applications, including lossless compression, information retrieval, natural language processing, and bioinformatics. Recent algorithms often assume that the strings to be processed are over polynomial integer alphabet, i.e., each symbol is an integer that is at most polynomial in the lengths of the strings. In contrast to that, the earlier days of stringology were shaped by the weaker comparison model, in which strings can only be accessed by mere equality comparisons of symbols, or (if the symbols are totally ordered) order comparisons of symbols. Nowadays, these flavors of the comparison model are respectively referred to as general unordered alphabet and general ordered alphabet. In this dissertation, we dive into the realm of both integer alphabets and general alphabets. We present new algorithms and lower bounds for classic problems, including Lempel-Ziv compression, computing the Lyndon array, and the detection of squares and runs. Our results show that, instead of only assuming the standard model of computation, it is important to also consider both weaker and stronger models. Particularly, we should not discard the older and weaker comparison-based models too quickly, as they are not only powerful theoretical tools, but also lead to fast and elegant practical solutions, even by today's standards
Algorithms and Lower Bounds for Ordering Problems on Strings
This dissertation presents novel algorithms and conditional lower bounds for a collection of string and text-compression-related problems. These results are unified under the theme of ordering constraint satisfaction. Utilizing the connections to ordering constraint satisfaction, we provide hardness results and algorithms for the following: recognizing a type of labeled graph amenable to text-indexing known as Wheeler graphs, minimizing the number of maximal unary substrings occurring in the Burrows-Wheeler Transformation of a text, minimizing the number of factors occurring in the Lyndon factorization of a text, and finding an optimal reference string for relative Lempel-Ziv encoding
Grammatical gender and linguistic complexity : Volume II: World-wide comparative studies
Peer reviewe
LIPIcs, Volume 244, ESA 2022, Complete Volume
LIPIcs, Volume 244, ESA 2022, Complete Volum
Bridging Formal and Conceptual Semantics
The articles in this volume are the outcome of the successful BRIDGE Workshop held in DĂŒsseldorf in 2014. The workshop gathered a number of distinguished researchers from formal semantics and conceptual semantics and aimed to initiate a deeper conversation and collaboration instead of separating the two sides as competing views. The workshop provided a platform to further discuss parallelisms on specific semantic issues on the one hand and on the other hand to confront opposed claims from the two different perspectives. This volume represents a selected number of high-quality papers presented at the workshop featuring various approaches to meaning from linguistics, logic and philosophy of language. This series explores issues of mental representation, linguistic structure and representation, and their interplay. The research presented in this series is grounded in the idea explored in the Collaborative Research Center âThe structure of representations in language, cognition and scienceâ (SFB 991) that there is a universal format for the representation of linguistic and cognitive concepts
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum