Search CORE

27,843 research outputs found

Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

Author: Bille Philip
Cording Patrick Hagge
Gørtz Inge Li
Skjoldjensen Frederik Rye
Vildhøj Hjalte Wedel
Vind Søren
Publication venue
Publication date: 01/01/2016
Field of study

Given a static reference string

R

and a source string

S

, a relative compression of

S

with respect to

R

is an encoding of

S

as a sequence of references to substrings of

R

. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string

S

is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem

arXiv.org e-Print Archive

Online Research Database In Technology

The Price of Uncertain Priors in Source Coding

Author: Braverman Mark
Juba Brendan
Publication venue
Publication date: 01/01/2018
Field of study

We consider the problem of one-way communication when the recipient does not know exactly the distribution that the messages are drawn from, but has a "prior" distribution that is known to be close to the source distribution, a problem first considered by Juba et al. We consider the question of how much longer the messages need to be in order to cope with the uncertainty about the receiver's prior and the source distribution, respectively, as compared to the standard source coding problem. We consider two variants of this uncertain priors problem: the original setting of Juba et al. in which the receiver is required to correctly recover the message with probability 1, and a setting introduced by Haramaty and Sudan, in which the receiver is permitted to fail with some probability

\epsilon

. In both settings, we obtain lower bounds that are tight up to logarithmically smaller terms. In the latter setting, we furthermore present a variant of the coding scheme of Juba et al. with an overhead of

\log\alpha+\log 1/\epsilon+1

bits, thus also establishing the nearly tight upper bound.Comment: To appear in IEEE Transactions on Information Theor

arXiv.org e-Print Archive

Princeton University Open Access Repository

Compression via Matroids: A Randomized Polynomial Kernel for Odd Cycle Transversal

Author: Kratsch Stefan
Wahlström Magnus
Publication venue
Publication date: 06/10/2011
Field of study

The Odd Cycle Transversal problem (OCT) asks whether a given graph can be made bipartite by deleting at most

k

of its vertices. In a breakthrough result Reed, Smith, and Vetta (Operations Research Letters, 2004) gave a \BigOh(4^kkmn) time algorithm for it, the first algorithm with polynomial runtime of uniform degree for every fixed

k

. It is known that this implies a polynomial-time compression algorithm that turns OCT instances into equivalent instances of size at most \BigOh(4^k), a so-called kernelization. Since then the existence of a polynomial kernel for OCT, i.e., a kernelization with size bounded polynomially in

k

, has turned into one of the main open questions in the study of kernelization. This work provides the first (randomized) polynomial kernelization for OCT. We introduce a novel kernelization approach based on matroid theory, where we encode all relevant information about a problem instance into a matroid with a representation of size polynomial in

k

. For OCT, the matroid is built to allow us to simulate the computation of the iterative compression step of the algorithm of Reed, Smith, and Vetta, applied (for only one round) to an approximate odd cycle transversal which it is aiming to shrink to size

k

. The process is randomized with one-sided error exponentially small in

k

, where the result can contain false positives but no false negatives, and the size guarantee is cubic in the size of the approximate solution. Combined with an \BigOh(\sqrt{\log n})-approximation (Agarwal et al., STOC 2005), we get a reduction of the instance to size \BigOh(k^{4.5}), implying a randomized polynomial kernelization.Comment: Minor changes to agree with SODA 2012 version of the pape

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

On optimal language compression for sets in PSPACE/poly

Author: Vinodchandran N. V.
Zimand Marius
Publication venue
Publication date: 03/04/2013
Field of study

We show that if DTIME[2^O(n)] is not included in DSPACE[2^o(n)], then, for every set B in PSPACE/poly, all strings x in B of length n can be represented by a string compressed(x) of length at most log(|B^{=n}|)+O(log n), such that a polynomial-time algorithm, given compressed(x), can distinguish x from all the other strings in B^{=n}. Modulo the O(log n) additive term, this achieves the information-theoretic optimum for string compression. We also observe that optimal compression is not possible for sets more complex than PSPACE/poly because for any time-constructible superpolynomial function t, there is a set A computable in space t(n) such that at least one string x of length n requires compressed(x) to be of length 2 log(|A^=n|).Comment: submitted to Theory of Computing System

arXiv.org e-Print Archive

CiteSeerX

Lossy Kernelization

Author: Lokshtanov Daniel
Panolan Fahad
Ramanujan M. S.
Saurabh Saket
Publication venue
Publication date: 04/11/2016
Field of study

In this paper we propose a new framework for analyzing the performance of preprocessing algorithms. Our framework builds on the notion of kernelization from parameterized complexity. However, as opposed to the original notion of kernelization, our definitions combine well with approximation algorithms and heuristics. The key new definition is that of a polynomial size

\alpha

-approximate kernel. Loosely speaking, a polynomial size

\alpha

-approximate kernel is a polynomial time pre-processing algorithm that takes as input an instance

(I,k)

to a parameterized problem, and outputs another instance

(I',k')

to the same problem, such that

|I'|+k' \leq k^{O(1)}

. Additionally, for every

c \geq 1

, a

c

-approximate solution

s'

to the pre-processed instance

(I',k')

can be turned in polynomial time into a

(c \cdot \alpha)

-approximate solution

s

to the original instance

(I,k)

. Our main technical contribution are

\alpha

-approximate kernels of polynomial size for three problems, namely Connected Vertex Cover, Disjoint Cycle Packing and Disjoint Factors. These problems are known not to admit any polynomial size kernels unless

NP \subseteq coNP/poly

. Our approximate kernels simultaneously beat both the lower bounds on the (normal) kernel size, and the hardness of approximation lower bounds for all three problems. On the negative side we prove that Longest Path parameterized by the length of the path and Set Cover parameterized by the universe size do not admit even an

\alpha

-approximate kernel of polynomial size, for any

\alpha \geq 1

, unless

NP \subseteq coNP/poly

. In order to prove this lower bound we need to combine in a non-trivial way the techniques used for showing kernelization lower bounds with the methods for showing hardness of approximationComment: 58 pages. Version 2 contain new results: PSAKS for Cycle Packing and approximate kernel lower bounds for Set Cover and Hitting Set parameterized by universe siz

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

Author: A. Amir
E.W. Myers
G. Navarro
G. Navarro
G. Navarro
G.M. Landau
J. Kärkkäinen
J. Ziv
J. Ziv
K. Thompson
M. Dietzfelbinger
M. Farach
P. Sellers
R. Cole
T.A. Welch
V. Mäkinen
Publication venue
Publication date: 01/01/2007
Field of study

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Southern Denmark Research Output

Online Research Database In Technology