2,287 research outputs found
Tight and simple Web graph compression
Analysing Web graphs has applications in determining page ranks, fighting Web
spam, detecting communities and mirror sites, and more. This study is however
hampered by the necessity of storing a major part of huge graphs in the
external memory, which prevents efficient random access to edge (hyperlink)
lists. A number of algorithms involving compression techniques have thus been
presented, to represent Web graphs succinctly but also providing random access.
Those techniques are usually based on differential encodings of the adjacency
lists, finding repeating nodes or node regions in the successive lists, more
general grammar-based transformations or 2-dimensional representations of the
binary matrix of the graph. In this paper we present two Web graph compression
algorithms. The first can be seen as engineering of the Boldi and Vigna (2004)
method. We extend the notion of similarity between link lists, and use a more
compact encoding of residuals. The algorithm works on blocks of varying size
(in the number of input lines) and sacrifices access time for better
compression ratio, achieving more succinct graph representation than other
algorithms reported in the literature. The second algorithm works on blocks of
the same size, in the number of input lines, and its key mechanism is merging
the block into a single ordered list. This method achieves much more attractive
space-time tradeoffs.Comment: 15 page
Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks
We continue the line of research on graph compression started with WebGraph,
but we move our focus to the compression of social networks in a proper sense
(e.g., LiveJournal): the approaches that have been used for a long time to
compress web graphs rely on a specific ordering of the nodes (lexicographical
URL ordering) whose extension to general social networks is not trivial. In
this paper, we propose a solution that mixes clusterings and orders, and devise
a new algorithm, called Layered Label Propagation, that builds on previous work
on scalable clustering and can be used to reorder very large graphs (billions
of nodes). Our implementation uses overdecomposition to perform aggressively on
multi-core architecture, making it possible to reorder graphs of more than 600
millions nodes in a few hours. Experiments performed on a wide array of web
graphs and social networks show that combining the order produced by the
proposed algorithm with the WebGraph compression framework provides a major
increase in compression with respect to all currently known techniques, both on
web graphs and on social networks. These improvements make it possible to
analyse in main memory significantly larger graphs
PReaCH: A Fast Lightweight Reachability Index using Pruning and Contraction Hierarchies
We develop the data structure PReaCH (for Pruned Reachability Contraction
Hierarchies) which supports reachability queries in a directed graph, i.e., it
supports queries that ask whether two nodes in the graph are connected by a
directed path. PReaCH adapts the contraction hierarchy speedup techniques for
shortest path queries to the reachability setting. The resulting approach is
surprisingly simple and guarantees linear space and near linear preprocessing
time. Orthogonally to that, we improve existing pruning techniques for the
search by gathering more information from a single DFS-traversal of the graph.
PReaCH-indices significantly outperform previous data structures with
comparable preprocessing cost. Methods with faster queries need significantly
more preprocessing time in particular for the most difficult instances
VoG: Summarizing and Understanding Large Graphs
How can we succinctly describe a million-node graph with a few simple
sentences? How can we measure the "importance" of a set of discovered subgraphs
in a large graph? These are exactly the problems we focus on. Our main ideas
are to construct a "vocabulary" of subgraph-types that often occur in real
graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the
most succinct description of a graph in terms of this vocabulary. We measure
success in a well-founded way by means of the Minimum Description Length (MDL)
principle: a subgraph is included in the summary if it decreases the total
description length of the graph.
Our contributions are three-fold: (a) formulation: we provide a principled
encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop
\method, an efficient method to minimize the description cost, and (c)
applicability: we report experimental results on multi-million-edge real
graphs, including Flickr and the Notre Dame web graph.Comment: SIAM International Conference on Data Mining (SDM) 201
{VoG}: {Summarizing} and Understanding Large Graphs
How can we succinctly describe a million-node graph with a few simple sentences? How can we measure the "importance" of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a "vocabulary" of subgraph-types that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary. We measure success in a well-founded way by means of the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph. Our contributions are three-fold: (a) formulation: we provide a principled encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop \method, an efficient method to minimize the description cost, and (c) applicability: we report experimental results on multi-million-edge real graphs, including Flickr and the Notre Dame web graph
Compressing Binary Decision Diagrams
The paper introduces a new technique for compressing Binary Decision Diagrams
in those cases where random access is not required. Using this technique,
compression and decompression can be done in linear time in the size of the BDD
and compression will in many cases reduce the size of the BDD to 1-2 bits per
node. Empirical results for our compression technique are presented, including
comparisons with previously introduced techniques, showing that the new
technique dominate on all tested instances.Comment: Full (tech-report) version of ECAI 2008 short pape
- …