27,843 research outputs found
Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation
Given a static reference string and a source string , a relative
compression of with respect to is an encoding of as a sequence of
references to substrings of . Relative compression schemes are a classic
model of compression and have recently proved very successful for compressing
highly-repetitive massive data sets such as genomes and web-data. We initiate
the study of relative compression in a dynamic setting where the compressed
source string is subject to edit operations. The goal is to maintain the
compressed representation compactly, while supporting edits and allowing
efficient random access to the (uncompressed) source string. We present new
data structures that achieve optimal time for updates and queries while using
space linear in the size of the optimal relative compression, for nearly all
combinations of parameters. We also present solutions for restricted and
extended sets of updates. To achieve these results, we revisit the dynamic
partial sums problem and the substring concatenation problem. We present new
optimal or near optimal bounds for these problems. Plugging in our new results
we also immediately obtain new bounds for the string indexing for patterns with
wildcards problem and the dynamic text and static pattern matching problem
The Price of Uncertain Priors in Source Coding
We consider the problem of one-way communication when the recipient does not
know exactly the distribution that the messages are drawn from, but has a
"prior" distribution that is known to be close to the source distribution, a
problem first considered by Juba et al. We consider the question of how much
longer the messages need to be in order to cope with the uncertainty about the
receiver's prior and the source distribution, respectively, as compared to the
standard source coding problem. We consider two variants of this uncertain
priors problem: the original setting of Juba et al. in which the receiver is
required to correctly recover the message with probability 1, and a setting
introduced by Haramaty and Sudan, in which the receiver is permitted to fail
with some probability . In both settings, we obtain lower bounds that
are tight up to logarithmically smaller terms. In the latter setting, we
furthermore present a variant of the coding scheme of Juba et al. with an
overhead of bits, thus also establishing the
nearly tight upper bound.Comment: To appear in IEEE Transactions on Information Theor
Compression via Matroids: A Randomized Polynomial Kernel for Odd Cycle Transversal
The Odd Cycle Transversal problem (OCT) asks whether a given graph can be
made bipartite by deleting at most of its vertices. In a breakthrough
result Reed, Smith, and Vetta (Operations Research Letters, 2004) gave a
\BigOh(4^kkmn) time algorithm for it, the first algorithm with polynomial
runtime of uniform degree for every fixed . It is known that this implies a
polynomial-time compression algorithm that turns OCT instances into equivalent
instances of size at most \BigOh(4^k), a so-called kernelization. Since then
the existence of a polynomial kernel for OCT, i.e., a kernelization with size
bounded polynomially in , has turned into one of the main open questions in
the study of kernelization.
This work provides the first (randomized) polynomial kernelization for OCT.
We introduce a novel kernelization approach based on matroid theory, where we
encode all relevant information about a problem instance into a matroid with a
representation of size polynomial in . For OCT, the matroid is built to
allow us to simulate the computation of the iterative compression step of the
algorithm of Reed, Smith, and Vetta, applied (for only one round) to an
approximate odd cycle transversal which it is aiming to shrink to size . The
process is randomized with one-sided error exponentially small in , where
the result can contain false positives but no false negatives, and the size
guarantee is cubic in the size of the approximate solution. Combined with an
\BigOh(\sqrt{\log n})-approximation (Agarwal et al., STOC 2005), we get a
reduction of the instance to size \BigOh(k^{4.5}), implying a randomized
polynomial kernelization.Comment: Minor changes to agree with SODA 2012 version of the pape
On optimal language compression for sets in PSPACE/poly
We show that if DTIME[2^O(n)] is not included in DSPACE[2^o(n)], then, for
every set B in PSPACE/poly, all strings x in B of length n can be represented
by a string compressed(x) of length at most log(|B^{=n}|)+O(log n), such that a
polynomial-time algorithm, given compressed(x), can distinguish x from all the
other strings in B^{=n}. Modulo the O(log n) additive term, this achieves the
information-theoretic optimum for string compression. We also observe that
optimal compression is not possible for sets more complex than PSPACE/poly
because for any time-constructible superpolynomial function t, there is a set A
computable in space t(n) such that at least one string x of length n requires
compressed(x) to be of length 2 log(|A^=n|).Comment: submitted to Theory of Computing System
Lossy Kernelization
In this paper we propose a new framework for analyzing the performance of
preprocessing algorithms. Our framework builds on the notion of kernelization
from parameterized complexity. However, as opposed to the original notion of
kernelization, our definitions combine well with approximation algorithms and
heuristics. The key new definition is that of a polynomial size
-approximate kernel. Loosely speaking, a polynomial size
-approximate kernel is a polynomial time pre-processing algorithm that
takes as input an instance to a parameterized problem, and outputs
another instance to the same problem, such that . Additionally, for every , a -approximate solution
to the pre-processed instance can be turned in polynomial time into a
-approximate solution to the original instance .
Our main technical contribution are -approximate kernels of
polynomial size for three problems, namely Connected Vertex Cover, Disjoint
Cycle Packing and Disjoint Factors. These problems are known not to admit any
polynomial size kernels unless . Our approximate
kernels simultaneously beat both the lower bounds on the (normal) kernel size,
and the hardness of approximation lower bounds for all three problems. On the
negative side we prove that Longest Path parameterized by the length of the
path and Set Cover parameterized by the universe size do not admit even an
-approximate kernel of polynomial size, for any , unless
. In order to prove this lower bound we need to combine
in a non-trivial way the techniques used for showing kernelization lower bounds
with the methods for showing hardness of approximationComment: 58 pages. Version 2 contain new results: PSAKS for Cycle Packing and
approximate kernel lower bounds for Set Cover and Hitting Set parameterized
by universe siz
Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts
We study the approximate string matching and regular expression matching
problem for the case when the text to be searched is compressed with the
Ziv-Lempel adaptive dictionary compression schemes. We present a time-space
trade-off that leads to algorithms improving the previously known complexities
for both problems. In particular, we significantly improve the space bounds,
which in practical applications are likely to be a bottleneck
- …