3 research outputs found
Compressed Tree Canonization
Straight-line (linear) context-free tree (SLT) grammars have been used to
compactly represent ordered trees. It is well known that equivalence of SLT
grammars is decidable in polynomial time. Here we extend this result and show
that isomorphism of unordered trees given as SLT grammars is decidable in
polynomial time. The proof constructs a compressed version of the canonical
form of the tree represented by the input SLT grammar. The result is
generalized to unrooted trees by "re-rooting" the compressed trees in
polynomial time. We further show that bisimulation equivalence of unrooted
unordered trees represented by SLT grammars is decidable in polynomial time.
For non-linear SLT grammars which can have double-exponential compression
ratios, we prove that unordered isomorphism is PSPACE-hard and in EXPTIME. The
same complexity bounds are shown for bisimulation equivalence
Graph compression using graph grammars
This thesis presents work done on compressed graph representations via hyperedge replacement
grammars. It comprises two main parts. Firstly the RePair compression scheme, known for
strings and trees, is generalized to graphs using graph grammars. Given an object, the scheme
produces a small context-free grammar generating the object (called a âstraight-line grammarâ).
The theoretical foundations of this generalization are presented, followed by a description of a
prototype implementation. This implementation is then evaluated on real-world and synthetic
graphs. The experiments show that several graphs can be compressed stronger by the new
method, than by current state-of-the-art approaches.
The second part considers algorithmic questions of straight-line graph grammars. Two algorithms
are presented to traverse the graph represented by such a grammar. Both algorithms have
advantages and disadvantages: the first one works with any grammar but its runtime per traversal
step is dependent on the input grammar. The second algorithm only needs constant time per
traversal step, but works for a restricted class of grammars and requires quadratic preprocessing
time and space. Finally speed-up algorithms are considered. These are algorithms that can
decide specific problems in time depending only on the size of the compressed representation,
and might thus be faster than a traditional algorithm would on the decompressed structure. The
idea of such algorithms is to reuse computation already done for the rules of the grammar. The
possible speed-ups achieved this way is proportional to the compression ratio of the grammar.
The main results here are a method to answer âregular path queriesâ, and to decide whether
two grammars generate isomorphic trees
Compressing Labels of Dynamic XML Data using Base-9 Scheme and Fibonacci Encoding
The flexibility and self-describing nature of XML has made it the most common mark-up language used for data representation over the Web. XML data is naturally modelled as a tree, where the structural tree information can be encoded into labels via XML labelling scheme in order to permit answers to queries without the need to access original XML files. As the transmission of XML data over the Internet has become vibrant, it has also become necessary to have an XML labelling scheme that supports dynamic XML data. For a large-scale and frequently updated XML document, existing dynamic XML labelling schemes still suffer from high growth rates in terms of their label size, which can result in overflow problems and/or ambiguous data/query retrievals.
This thesis considers the compression of XML labels. A novel XML labelling scheme, named âBase-9â, has been developed to generate labels that are as compact as possible and yet provide efficient support for queries to both static and dynamic XML data. A Fibonacci prefix-encoding method has been used for the first time to store Base-9âs XML labels in a compressed format, with the intention of minimising the storage space without degrading XML querying performance. The thesis also investigates the compression of XML labels using various existing prefix-encoding methods. This investigation has resulted in the proposal of a novel prefix-encoding method named âElias-Fibonacci of order 3â, which has achieved the fastest encoding time of all prefix-encoding methods studied in this thesis, whereas Fibonacci encoding was found to require the minimum storage.
Unlike current XML labelling schemes, the new Base-9 labelling scheme ensures the generation of short labels even after large, frequent, skewed insertions. The advantages of such short labels as those generated by the combination of applying the Base-9 scheme and the use of Fibonacci encoding in terms of storing, updating, retrieving and querying XML data are supported by the experimental results reported herein