4,182 research outputs found

    Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

    Get PDF
    Given a static reference string RR and a source string SS, a relative compression of SS with respect to RR is an encoding of SS as a sequence of references to substrings of RR. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string SS is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem

    Compressed Text Indexes:From Theory to Practice!

    Full text link
    A compressed full-text self-index represents a text in a compressed form and still answers queries efficiently. This technology represents a breakthrough over the text indexing techniques of the previous decade, whose indexes required several times the size of the text. Although it is relatively new, this technology has matured up to a point where theoretical research is giving way to practical developments. Nonetheless this requires significant programming skills, a deep engineering effort, and a strong algorithmic background to dig into the research results. To date only isolated implementations and focused comparisons of compressed indexes have been reported, and they missed a common API, which prevented their re-use or deployment within other applications. The goal of this paper is to fill this gap. First, we present the existing implementations of compressed indexes from a practitioner's point of view. Second, we introduce the Pizza&Chili site, which offers tuned implementations and a standardized API for the most successful compressed full-text self-indexes, together with effective testbeds and scripts for their automatic validation and test. Third, we show the results of our extensive experiments on these codes with the aim of demonstrating the practical relevance of this novel and exciting technology

    Compressed Subsequence Matching and Packed Tree Coloring

    Get PDF
    We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size nn compressing a string of size NN and a pattern string of size mm over an alphabet of size σ\sigma, our algorithm uses O(n+nσw)O(n+\frac{n\sigma}{w}) space and O(n+nσw+mlog⁡Nlog⁡w⋅occ)O(n+\frac{n\sigma}{w}+m\log N\log w\cdot occ) or O(n+nσwlog⁡w+mlog⁡N⋅occ)O(n+\frac{n\sigma}{w}\log w+m\log N\cdot occ) time. Here ww is the word size and occocc is the number of occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for occ=o(nlog⁡N)occ=o(\frac{n}{\log N}) occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.Comment: To appear at CPM '1
    • …
    corecore