10,656 research outputs found

    Tight and simple Web graph compression

    Full text link
    Analysing Web graphs has applications in determining page ranks, fighting Web spam, detecting communities and mirror sites, and more. This study is however hampered by the necessity of storing a major part of huge graphs in the external memory, which prevents efficient random access to edge (hyperlink) lists. A number of algorithms involving compression techniques have thus been presented, to represent Web graphs succinctly but also providing random access. Those techniques are usually based on differential encodings of the adjacency lists, finding repeating nodes or node regions in the successive lists, more general grammar-based transformations or 2-dimensional representations of the binary matrix of the graph. In this paper we present two Web graph compression algorithms. The first can be seen as engineering of the Boldi and Vigna (2004) method. We extend the notion of similarity between link lists, and use a more compact encoding of residuals. The algorithm works on blocks of varying size (in the number of input lines) and sacrifices access time for better compression ratio, achieving more succinct graph representation than other algorithms reported in the literature. The second algorithm works on blocks of the same size, in the number of input lines, and its key mechanism is merging the block into a single ordered list. This method achieves much more attractive space-time tradeoffs.Comment: 15 page

    Structure induction by lossless graph compression

    Full text link
    This work is motivated by the necessity to automate the discovery of structure in vast and evergrowing collection of relational data commonly represented as graphs, for example genomic networks. A novel algorithm, dubbed Graphitour, for structure induction by lossless graph compression is presented and illustrated by a clear and broadly known case of nested structure in a DNA molecule. This work extends to graphs some well established approaches to grammatical inference previously applied only to strings. The bottom-up graph compression problem is related to the maximum cardinality (non-bipartite) maximum cardinality matching problem. The algorithm accepts a variety of graph types including directed graphs and graphs with labeled nodes and arcs. The resulting structure could be used for representation and classification of graphs.Comment: 10 pages, 7 figures, 2 tables published in Proceedings of the Data Compression Conference, 200

    On Unification Modulo One-Sided Distributivity: Algorithms, Variants and Asymmetry

    Full text link
    An algorithm for unification modulo one-sided distributivity is an early result by Tid\'en and Arnborg. More recently this theory has been of interest in cryptographic protocol analysis due to the fact that many cryptographic operators satisfy this property. Unfortunately the algorithm presented in the paper, although correct, has recently been shown not to be polynomial time bounded as claimed. In addition, for some instances, there exist most general unifiers that are exponentially large with respect to the input size. In this paper we first present a new polynomial time algorithm that solves the decision problem for a non-trivial subcase, based on a typed theory, of unification modulo one-sided distributivity. Next we present a new polynomial algorithm that solves the decision problem for unification modulo one-sided distributivity. A construction, employing string compression, is used to achieve the polynomial bound. Lastly, we examine the one-sided distributivity problem in the new asymmetric unification paradigm. We give the first asymmetric unification algorithm for one-sided distributivity

    XML Compression via DAGs

    Full text link
    Unranked trees can be represented using their minimal dag (directed acyclic graph). For XML this achieves high compression ratios due to their repetitive mark up. Unranked trees are often represented through first child/next sibling (fcns) encoded binary trees. We study the difference in size (= number of edges) of minimal dag versus minimal dag of the fcns encoded binary tree. One main finding is that the size of the dag of the binary tree can never be smaller than the square root of the size of the minimal dag, and that there are examples that match this bound. We introduce a new combined structure, the hybrid dag, which is guaranteed to be smaller than (or equal in size to) both dags. Interestingly, we find through experiments that last child/previous sibling encodings are much better for XML compression via dags, than fcns encodings. We determine the average sizes of unranked and binary dags over a given set of labels (under uniform distribution) in terms of their exact generating functions, and in terms of their asymptotical behavior.Comment: A short version of this paper appeared in the Proceedings of ICDT 201
    • …
    corecore