10,656 research outputs found
Tight and simple Web graph compression
Analysing Web graphs has applications in determining page ranks, fighting Web
spam, detecting communities and mirror sites, and more. This study is however
hampered by the necessity of storing a major part of huge graphs in the
external memory, which prevents efficient random access to edge (hyperlink)
lists. A number of algorithms involving compression techniques have thus been
presented, to represent Web graphs succinctly but also providing random access.
Those techniques are usually based on differential encodings of the adjacency
lists, finding repeating nodes or node regions in the successive lists, more
general grammar-based transformations or 2-dimensional representations of the
binary matrix of the graph. In this paper we present two Web graph compression
algorithms. The first can be seen as engineering of the Boldi and Vigna (2004)
method. We extend the notion of similarity between link lists, and use a more
compact encoding of residuals. The algorithm works on blocks of varying size
(in the number of input lines) and sacrifices access time for better
compression ratio, achieving more succinct graph representation than other
algorithms reported in the literature. The second algorithm works on blocks of
the same size, in the number of input lines, and its key mechanism is merging
the block into a single ordered list. This method achieves much more attractive
space-time tradeoffs.Comment: 15 page
Structure induction by lossless graph compression
This work is motivated by the necessity to automate the discovery of
structure in vast and evergrowing collection of relational data commonly
represented as graphs, for example genomic networks. A novel algorithm, dubbed
Graphitour, for structure induction by lossless graph compression is presented
and illustrated by a clear and broadly known case of nested structure in a DNA
molecule. This work extends to graphs some well established approaches to
grammatical inference previously applied only to strings. The bottom-up graph
compression problem is related to the maximum cardinality (non-bipartite)
maximum cardinality matching problem. The algorithm accepts a variety of graph
types including directed graphs and graphs with labeled nodes and arcs. The
resulting structure could be used for representation and classification of
graphs.Comment: 10 pages, 7 figures, 2 tables published in Proceedings of the Data
Compression Conference, 200
On Unification Modulo One-Sided Distributivity: Algorithms, Variants and Asymmetry
An algorithm for unification modulo one-sided distributivity is an early
result by Tid\'en and Arnborg. More recently this theory has been of interest
in cryptographic protocol analysis due to the fact that many cryptographic
operators satisfy this property. Unfortunately the algorithm presented in the
paper, although correct, has recently been shown not to be polynomial time
bounded as claimed. In addition, for some instances, there exist most general
unifiers that are exponentially large with respect to the input size. In this
paper we first present a new polynomial time algorithm that solves the decision
problem for a non-trivial subcase, based on a typed theory, of unification
modulo one-sided distributivity. Next we present a new polynomial algorithm
that solves the decision problem for unification modulo one-sided
distributivity. A construction, employing string compression, is used to
achieve the polynomial bound. Lastly, we examine the one-sided distributivity
problem in the new asymmetric unification paradigm. We give the first
asymmetric unification algorithm for one-sided distributivity
XML Compression via DAGs
Unranked trees can be represented using their minimal dag (directed acyclic
graph). For XML this achieves high compression ratios due to their repetitive
mark up. Unranked trees are often represented through first child/next sibling
(fcns) encoded binary trees. We study the difference in size (= number of
edges) of minimal dag versus minimal dag of the fcns encoded binary tree. One
main finding is that the size of the dag of the binary tree can never be
smaller than the square root of the size of the minimal dag, and that there are
examples that match this bound. We introduce a new combined structure, the
hybrid dag, which is guaranteed to be smaller than (or equal in size to) both
dags. Interestingly, we find through experiments that last child/previous
sibling encodings are much better for XML compression via dags, than fcns
encodings. We determine the average sizes of unranked and binary dags over a
given set of labels (under uniform distribution) in terms of their exact
generating functions, and in terms of their asymptotical behavior.Comment: A short version of this paper appeared in the Proceedings of ICDT
201
- …