Search CORE

186 research outputs found

XML Compression via DAGs

Author: Bousquet-Melou Mireille
Lohrey Markus
Maneth Sebastian
Noeth Eric
Publication venue
Publication date: 01/01/2013
Field of study

Unranked trees can be represented using their minimal dag (directed acyclic graph). For XML this achieves high compression ratios due to their repetitive mark up. Unranked trees are often represented through first child/next sibling (fcns) encoded binary trees. We study the difference in size (= number of edges) of minimal dag versus minimal dag of the fcns encoded binary tree. One main finding is that the size of the dag of the binary tree can never be smaller than the square root of the size of the minimal dag, and that there are examples that match this bound. We introduce a new combined structure, the hybrid dag, which is guaranteed to be smaller than (or equal in size to) both dags. Interestingly, we find through experiments that last child/previous sibling encodings are much better for XML compression via dags, than fcns encodings. We determine the average sizes of unranked and binary dags over a given set of labels (under uniform distribution) in terms of their exact generating functions, and in terms of their asymptotical behavior.Comment: A short version of this paper appeared in the Proceedings of ICDT 201

arXiv.org e-Print Archive

CiteSeerX

Unification and Matching on Compressed Terms

Author: Gascón Adrià
Godoy Guillem
Schmidt-Schauß Manfred
Publication venue
Publication date: 08/03/2010
Field of study

Term unification plays an important role in many areas of computer science, especially in those related to logic. The universal mechanism of grammar-based compression for terms, in particular the so-called Singleton Tree Grammars (STG), have recently drawn considerable attention. Using STGs, terms of exponential size and height can be represented in linear space. Furthermore, the term representation by directed acyclic graphs (dags) can be efficiently simulated. The present paper is the result of an investigation on term unification and matching when the terms given as input are represented using different compression mechanisms for terms such as dags and Singleton Tree Grammars. We describe a polynomial time algorithm for context matching with dags, when the number of different context variables is fixed for the problem. For the same problem, NP-completeness is obtained when the terms are represented using the more general formalism of Singleton Tree Grammars. For first-order unification and matching polynomial time algorithms are presented, each of them improving previous results for those problems.Comment: This paper is posted at the Computing Research Repository (CoRR) as part of the process of submission to the journal ACM Transactions on Computational Logic (TOCL)

arXiv.org e-Print Archive

CiteSeerX

Traversing Grammar-Compressed Trees with Constant Delay

Author: Lohrey Markus
Maneth Sebastian
Reh Carl Philipp
Publication venue
Publication date: 10/11/2015
Field of study

A grammar-compressed ranked tree is represented with a linear space overhead so that a single traversal step, i.e., the move to the parent or the i-th child, can be carried out in constant time. Moreover, we extend our data structure such that equality of subtrees can be checked in constant time

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Fingerprints in Compressed Strings

Author: A. Amir
D. Harel
D. Willard
F. Claude
G. Cormode
J. Ziv
J. Ziv
K. Mehlhorn
L. Gąsieniec
M. Bender
M. Charikar
M. Farach
O. Berkman
P. Bille
P. Emde Boas van
P.F. Dietz
R. Cole
R.M. Karp
S. Alstrup
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string S of size N compressed by a context-free grammar of size n that answers fingerprint queries. That is, given indices i and j, the answer to a query is the fingerprint of the substring S[i,j]. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get O(logN) query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get O(log log N) query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time O(log N log l) and O(log l log log l + log log N) for SLPs and Linear SLPs, respectively. Here, l denotes the length of the LCE

arXiv.org e-Print Archive

CiteSeerX

Crossref

Warwick Research Archives Portal Repository

Online Research Database In Technology

Algorithms and data structures for grammar-compressed strings

Author: Cording Patrick Hagge
Publication venue: Technical University of Denmark
Publication date: 01/01/2015
Field of study

Online Research Database In Technology

Fingerprints in compressed strings

Author: Bille Philip
Cording Patrick Hagge
Gørtz Inge Li
Sach Benjamin
Vildhøj Hjalte Wedel
Vind Søren
Publication venue: 'Elsevier BV'
Publication date: 01/06/2017
Field of study

Abstract. The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string S of size N compressed by a context-free grammar of size n that answers fingerprint queries. That is, given indices i and j, the answer to a query is the fingerprint of the substring S[i, j]. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get O(logN) query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get O(log logN) query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time O(logN log `) and O(log ` log log `+ log logN) for SLPs and Linear SLPs, respectively. Here, ` denotes the length of the LCE.

CiteSeerX

Crossref

Online Research Database In Technology

Explore Bristol Research