3 research outputs found

    Compressed Tree Canonization

    Get PDF
    Straight-line (linear) context-free tree (SLT) grammars have been used to compactly represent ordered trees. It is well known that equivalence of SLT grammars is decidable in polynomial time. Here we extend this result and show that isomorphism of unordered trees given as SLT grammars is decidable in polynomial time. The proof constructs a compressed version of the canonical form of the tree represented by the input SLT grammar. The result is generalized to unrooted trees by "re-rooting" the compressed trees in polynomial time. We further show that bisimulation equivalence of unrooted unordered trees represented by SLT grammars is decidable in polynomial time. For non-linear SLT grammars which can have double-exponential compression ratios, we prove that unordered isomorphism is PSPACE-hard and in EXPTIME. The same complexity bounds are shown for bisimulation equivalence

    Graph compression using graph grammars

    Get PDF
    This thesis presents work done on compressed graph representations via hyperedge replacement grammars. It comprises two main parts. Firstly the RePair compression scheme, known for strings and trees, is generalized to graphs using graph grammars. Given an object, the scheme produces a small context-free grammar generating the object (called a “straight-line grammar”). The theoretical foundations of this generalization are presented, followed by a description of a prototype implementation. This implementation is then evaluated on real-world and synthetic graphs. The experiments show that several graphs can be compressed stronger by the new method, than by current state-of-the-art approaches. The second part considers algorithmic questions of straight-line graph grammars. Two algorithms are presented to traverse the graph represented by such a grammar. Both algorithms have advantages and disadvantages: the first one works with any grammar but its runtime per traversal step is dependent on the input grammar. The second algorithm only needs constant time per traversal step, but works for a restricted class of grammars and requires quadratic preprocessing time and space. Finally speed-up algorithms are considered. These are algorithms that can decide specific problems in time depending only on the size of the compressed representation, and might thus be faster than a traditional algorithm would on the decompressed structure. The idea of such algorithms is to reuse computation already done for the rules of the grammar. The possible speed-ups achieved this way is proportional to the compression ratio of the grammar. The main results here are a method to answer “regular path queries”, and to decide whether two grammars generate isomorphic trees

    Compressing Labels of Dynamic XML Data using Base-9 Scheme and Fibonacci Encoding

    Get PDF
    The flexibility and self-describing nature of XML has made it the most common mark-up language used for data representation over the Web. XML data is naturally modelled as a tree, where the structural tree information can be encoded into labels via XML labelling scheme in order to permit answers to queries without the need to access original XML files. As the transmission of XML data over the Internet has become vibrant, it has also become necessary to have an XML labelling scheme that supports dynamic XML data. For a large-scale and frequently updated XML document, existing dynamic XML labelling schemes still suffer from high growth rates in terms of their label size, which can result in overflow problems and/or ambiguous data/query retrievals. This thesis considers the compression of XML labels. A novel XML labelling scheme, named “Base-9”, has been developed to generate labels that are as compact as possible and yet provide efficient support for queries to both static and dynamic XML data. A Fibonacci prefix-encoding method has been used for the first time to store Base-9’s XML labels in a compressed format, with the intention of minimising the storage space without degrading XML querying performance. The thesis also investigates the compression of XML labels using various existing prefix-encoding methods. This investigation has resulted in the proposal of a novel prefix-encoding method named “Elias-Fibonacci of order 3”, which has achieved the fastest encoding time of all prefix-encoding methods studied in this thesis, whereas Fibonacci encoding was found to require the minimum storage. Unlike current XML labelling schemes, the new Base-9 labelling scheme ensures the generation of short labels even after large, frequent, skewed insertions. The advantages of such short labels as those generated by the combination of applying the Base-9 scheme and the use of Fibonacci encoding in terms of storing, updating, retrieving and querying XML data are supported by the experimental results reported herein
    corecore