27,843 research outputs found

    Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

    Get PDF
    Given a static reference string RR and a source string SS, a relative compression of SS with respect to RR is an encoding of SS as a sequence of references to substrings of RR. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string SS is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem

    The Price of Uncertain Priors in Source Coding

    Full text link
    We consider the problem of one-way communication when the recipient does not know exactly the distribution that the messages are drawn from, but has a "prior" distribution that is known to be close to the source distribution, a problem first considered by Juba et al. We consider the question of how much longer the messages need to be in order to cope with the uncertainty about the receiver's prior and the source distribution, respectively, as compared to the standard source coding problem. We consider two variants of this uncertain priors problem: the original setting of Juba et al. in which the receiver is required to correctly recover the message with probability 1, and a setting introduced by Haramaty and Sudan, in which the receiver is permitted to fail with some probability ϵ\epsilon. In both settings, we obtain lower bounds that are tight up to logarithmically smaller terms. In the latter setting, we furthermore present a variant of the coding scheme of Juba et al. with an overhead of logα+log1/ϵ+1\log\alpha+\log 1/\epsilon+1 bits, thus also establishing the nearly tight upper bound.Comment: To appear in IEEE Transactions on Information Theor

    Compression via Matroids: A Randomized Polynomial Kernel for Odd Cycle Transversal

    Full text link
    The Odd Cycle Transversal problem (OCT) asks whether a given graph can be made bipartite by deleting at most kk of its vertices. In a breakthrough result Reed, Smith, and Vetta (Operations Research Letters, 2004) gave a \BigOh(4^kkmn) time algorithm for it, the first algorithm with polynomial runtime of uniform degree for every fixed kk. It is known that this implies a polynomial-time compression algorithm that turns OCT instances into equivalent instances of size at most \BigOh(4^k), a so-called kernelization. Since then the existence of a polynomial kernel for OCT, i.e., a kernelization with size bounded polynomially in kk, has turned into one of the main open questions in the study of kernelization. This work provides the first (randomized) polynomial kernelization for OCT. We introduce a novel kernelization approach based on matroid theory, where we encode all relevant information about a problem instance into a matroid with a representation of size polynomial in kk. For OCT, the matroid is built to allow us to simulate the computation of the iterative compression step of the algorithm of Reed, Smith, and Vetta, applied (for only one round) to an approximate odd cycle transversal which it is aiming to shrink to size kk. The process is randomized with one-sided error exponentially small in kk, where the result can contain false positives but no false negatives, and the size guarantee is cubic in the size of the approximate solution. Combined with an \BigOh(\sqrt{\log n})-approximation (Agarwal et al., STOC 2005), we get a reduction of the instance to size \BigOh(k^{4.5}), implying a randomized polynomial kernelization.Comment: Minor changes to agree with SODA 2012 version of the pape

    On optimal language compression for sets in PSPACE/poly

    Full text link
    We show that if DTIME[2^O(n)] is not included in DSPACE[2^o(n)], then, for every set B in PSPACE/poly, all strings x in B of length n can be represented by a string compressed(x) of length at most log(|B^{=n}|)+O(log n), such that a polynomial-time algorithm, given compressed(x), can distinguish x from all the other strings in B^{=n}. Modulo the O(log n) additive term, this achieves the information-theoretic optimum for string compression. We also observe that optimal compression is not possible for sets more complex than PSPACE/poly because for any time-constructible superpolynomial function t, there is a set A computable in space t(n) such that at least one string x of length n requires compressed(x) to be of length 2 log(|A^=n|).Comment: submitted to Theory of Computing System

    Lossy Kernelization

    Get PDF
    In this paper we propose a new framework for analyzing the performance of preprocessing algorithms. Our framework builds on the notion of kernelization from parameterized complexity. However, as opposed to the original notion of kernelization, our definitions combine well with approximation algorithms and heuristics. The key new definition is that of a polynomial size α\alpha-approximate kernel. Loosely speaking, a polynomial size α\alpha-approximate kernel is a polynomial time pre-processing algorithm that takes as input an instance (I,k)(I,k) to a parameterized problem, and outputs another instance (I,k)(I',k') to the same problem, such that I+kkO(1)|I'|+k' \leq k^{O(1)}. Additionally, for every c1c \geq 1, a cc-approximate solution ss' to the pre-processed instance (I,k)(I',k') can be turned in polynomial time into a (cα)(c \cdot \alpha)-approximate solution ss to the original instance (I,k)(I,k). Our main technical contribution are α\alpha-approximate kernels of polynomial size for three problems, namely Connected Vertex Cover, Disjoint Cycle Packing and Disjoint Factors. These problems are known not to admit any polynomial size kernels unless NPcoNP/polyNP \subseteq coNP/poly. Our approximate kernels simultaneously beat both the lower bounds on the (normal) kernel size, and the hardness of approximation lower bounds for all three problems. On the negative side we prove that Longest Path parameterized by the length of the path and Set Cover parameterized by the universe size do not admit even an α\alpha-approximate kernel of polynomial size, for any α1\alpha \geq 1, unless NPcoNP/polyNP \subseteq coNP/poly. In order to prove this lower bound we need to combine in a non-trivial way the techniques used for showing kernelization lower bounds with the methods for showing hardness of approximationComment: 58 pages. Version 2 contain new results: PSAKS for Cycle Packing and approximate kernel lower bounds for Set Cover and Hitting Set parameterized by universe siz

    Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

    Full text link
    We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck
    corecore