Search CORE

33 research outputs found

Lyndon Factorization of Grammar Compressed Texts Revisited

Author: Bannai Hideo
Furuya Isamu
I Tomohiro
Inenaga Shunsuke
Nakashima Yuto
Takeda Masayuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Annual Symposium on Combinatorial Pattern Matching (CPM 2018)
Publication date: 01/01/2018
Field of study

We revisit the problem of computing the Lyndon factorization of a string w of length N which is given as a straight line program (SLP) of size n. For this problem, we show a new algorithm which runs in O(P(n, N) + Q(n, N)n log log N) time and O(n log N + S(n, N)) space where P(n, N), S(n,N), Q(n,N) are respectively the pre-processing time, space, and query time of a data structure for longest common extensions (LCE) on SLPs. Our algorithm improves the algorithm proposed by I et al. (TCS \u2717), and can be more efficient than the O(N)-time solution by Duval (J. Algorithms \u2783) when w is highly compressible

Dagstuhl Research Online Publication Server

Optimal construction of compressed indexes for highly repetitive texts

Author: Kempa D.
Publication venue: Society for Industrial and Applied Mathematics
Publication date: 01/01/2019
Field of study

We propose algorithms that, given the input string of length n over integer alphabet of size σ, construct the Burrows–Wheeler transform (BWT), the permuted longest-common-prefix (PLCP) array, and the LZ77 parsing in O(n/ logσ n + r polylog n) time and working space, where r is the number of runs in the BWT of the input. These are the essential components of many compressed indexes such as compressed suffix tree, FM-index, and grammar and LZ77-based indexes, but also find numerous applications in sequence analysis and data compression. The value of r is a common measure of repetitiveness that is significantly smaller than n if the string is highly repetitive. Since just accessing every symbol of the string requires Ω(n/ logσ n) time, the presented algorithms are time and space optimal for inputs satisfying the assumption n/r ∈ Ω(polylog n) on the repetitiveness. For such inputs our result improves upon the currently fastest general algorithms of Belazzougui (STOC 2014) and Munro et al. (SODA 2017) which run in O(n) time and use O(n/ logσ n) working space. We also show how to use our techniques to obtain optimal solutions on highly repetitive data for other fundamental string processing problems such as: Lyndon factorization, construction of run-length compressed suffix arrays, and some classical “textbook” problems such as computing the longest substring occurring at least some fixed number of times. Copyright © 2019 by SIAMPeer reviewe

arXiv.org e-Print Archive

Crossref

Helsingin yliopiston digitaalinen arkisto

Finding an Optimal Alphabet Ordering for Lyndon Factorization Is Hard

Author: Gibney Daniel
Thankachan Sharma V.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

Efficient string algorithmics across alphabet realms

Author: Ellert Jonas
Publication venue
Publication date: 01/01/2024
Field of study

Stringology is a subfield of computer science dedicated to analyzing and processing sequences of symbols. It plays a crucial role in various applications, including lossless compression, information retrieval, natural language processing, and bioinformatics. Recent algorithms often assume that the strings to be processed are over polynomial integer alphabet, i.e., each symbol is an integer that is at most polynomial in the lengths of the strings. In contrast to that, the earlier days of stringology were shaped by the weaker comparison model, in which strings can only be accessed by mere equality comparisons of symbols, or (if the symbols are totally ordered) order comparisons of symbols. Nowadays, these flavors of the comparison model are respectively referred to as general unordered alphabet and general ordered alphabet. In this dissertation, we dive into the realm of both integer alphabets and general alphabets. We present new algorithms and lower bounds for classic problems, including Lempel-Ziv compression, computing the Lyndon array, and the detection of squares and runs. Our results show that, instead of only assuming the standard model of computation, it is important to also consider both weaker and stronger models. Particularly, we should not discard the older and weaker comparison-based models too quickly, as they are not only powerful theoretical tools, but also lead to fast and elegant practical solutions, even by today's standards

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Algorithms and Lower Bounds for Ordering Problems on Strings

Author: Gibney Daniel
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2021
Field of study

This dissertation presents novel algorithms and conditional lower bounds for a collection of string and text-compression-related problems. These results are unified under the theme of ordering constraint satisfaction. Utilizing the connections to ordering constraint satisfaction, we provide hardness results and algorithms for the following: recognizing a type of labeled graph amenable to text-indexing known as Wheeler graphs, minimizing the number of maximal unary substrings occurring in the Burrows-Wheeler Transformation of a text, minimizing the number of factors occurring in the Lyndon factorization of a text, and finding an optimal reference string for relative Lempel-Ziv encoding

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Repetitiveness Measures based on String Attractors and Burrows-Wheeler Transform: Properties and Applications

Author: ROMANA Giuseppe
Publication venue: place:Palermo
Publication date: 06/07/2023
Field of study

Archivio istituzionale della ricerca - Università di Palermo

Grammatical gender and linguistic complexity : Volume II: World-wide comparative studies

Author
Publication venue: Language Science Press
Publication date: 01/01/2019
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

LIPIcs, Volume 244, ESA 2022, Complete Volume

Author: Chechik Shiri
Herman Grzegorz
Navarro Gonzalo
Rotenberg Eva
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

LIPIcs, Volume 244, ESA 2022, Complete Volum

Dagstuhl Research Online Publication Server

Bridging Formal and Conceptual Semantics

Author: Balogh Kata
Petersen Wiebke
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 21/11/2022
Field of study

The articles in this volume are the outcome of the successful BRIDGE Workshop held in Düsseldorf in 2014. The workshop gathered a number of distinguished researchers from formal semantics and conceptual semantics and aimed to initiate a deeper conversation and collaboration instead of separating the two sides as competing views. The workshop provided a platform to further discuss parallelisms on specific semantic issues on the one hand and on the other hand to confront opposed claims from the two different perspectives. This volume represents a selected number of high-quality papers presented at the workshop featuring various approaches to meaning from linguistics, logic and philosophy of language. This series explores issues of mental representation, linguistic structure and representation, and their interplay. The research presented in this series is grounded in the idea explored in the Collaborative Research Center ‘The structure of representations in language, cognition and science’ (SFB 991) that there is a universal format for the representation of linguistic and cognitive concepts

Directory of Open Access Books (DOAB)

LIPIcs, Volume 274, ESA 2023, Complete Volume

Author: Farach-Colton Martin
Herman Grzegorz
Puglisi Simon J.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual European Symposium on Algorithms (ESA 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 274, ESA 2023, Complete Volum

Dagstuhl Research Online Publication Server