150 research outputs found

    Queries on LZ-Bounded Encodings

    Full text link
    We describe a data structure that stores a string SS in space similar to that of its Lempel-Ziv encoding and efficiently supports access, rank and select queries. These queries are fundamental for implementing succinct and compressed data structures, such as compressed trees and graphs. We show that our data structure can be built in a scalable manner and is both small and fast in practice compared to other data structures supporting such queries

    Computing LZ77 in Run-Compressed Space

    Get PDF
    In this paper, we show that the LZ77 factorization of a text T {\in\Sigma^n} can be computed in O(R log n) bits of working space and O(n log R) time, R being the number of runs in the Burrows-Wheeler transform of T reversed. For extremely repetitive inputs, the working space can be as low as O(log n) bits: exponentially smaller than the text itself. As a direct consequence of our result, we show that a class of repetition-aware self-indexes based on a combination of run-length encoded BWT and LZ77 can be built in asymptotically optimal O(R + z) words of working space, z being the size of the LZ77 parsing

    Hard Instances of the Constrained Discrete Logarithm Problem

    Full text link
    The discrete logarithm problem (DLP) generalizes to the constrained DLP, where the secret exponent xx belongs to a set known to the attacker. The complexity of generic algorithms for solving the constrained DLP depends on the choice of the set. Motivated by cryptographic applications, we study sets with succinct representation for which the constrained DLP is hard. We draw on earlier results due to Erd\"os et al. and Schnorr, develop geometric tools such as generalized Menelaus' theorem for proving lower bounds on the complexity of the constrained DLP, and construct sets with succinct representation with provable non-trivial lower bounds

    Small space and streaming pattern matching with k edits

    Full text link
    In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer kk, a pattern PP of length mm, and a text TT of length nmn \ge m, the task is to find substrings of TT that are within edit distance kk from PP. Our main result is a streaming algorithm that solves the problem in O~(k5)\tilde{O}(k^5) space and O~(k8)\tilde{O}(k^8) amortised time per character of the text, providing answers correct with high probability. (Hereafter, O~()\tilde{O}(\cdot) hides a poly(logn)\mathrm{poly}(\log n) factor.) This answers a decade-old question: since the discovery of a poly(klogn)\mathrm{poly}(k\log n)-space streaming algorithm for pattern matching under Hamming distance by Porat and Porat [FOCS 2009], the existence of an analogous result for edit distance remained open. Up to this work, no poly(klogn)\mathrm{poly}(k\log n)-space algorithm was known even in the simpler semi-streaming model, where TT comes as a stream but PP is available for read-only access. In this model, we give a deterministic algorithm that achieves slightly better complexity. In order to develop the fully streaming algorithm, we introduce a new edit distance sketch parametrised by integers nkn\ge k. For any string of length at most nn, the sketch is of size O~(k2)\tilde{O}(k^2) and it can be computed with an O~(k2)\tilde{O}(k^2)-space streaming algorithm. Given the sketches of two strings, in O~(k3)\tilde{O}(k^3) time we can compute their edit distance or certify that it is larger than kk. This result improves upon O~(k8)\tilde{O}(k^8)-size sketches of Belazzougui and Zhu [FOCS 2016] and very recent O~(k3)\tilde{O}(k^3)-size sketches of Jin, Nelson, and Wu [STACS 2021]
    corecore