Search CORE

2,287 research outputs found

Tight and simple Web graph compression

Author: Bieniecki Wojciech
Grabowski Szymon
Publication venue
Publication date: 01/01/2010
Field of study

Analysing Web graphs has applications in determining page ranks, fighting Web spam, detecting communities and mirror sites, and more. This study is however hampered by the necessity of storing a major part of huge graphs in the external memory, which prevents efficient random access to edge (hyperlink) lists. A number of algorithms involving compression techniques have thus been presented, to represent Web graphs succinctly but also providing random access. Those techniques are usually based on differential encodings of the adjacency lists, finding repeating nodes or node regions in the successive lists, more general grammar-based transformations or 2-dimensional representations of the binary matrix of the graph. In this paper we present two Web graph compression algorithms. The first can be seen as engineering of the Boldi and Vigna (2004) method. We extend the notion of similarity between link lists, and use a more compact encoding of residuals. The algorithm works on blocks of varying size (in the number of input lines) and sacrifices access time for better compression ratio, achieving more succinct graph representation than other algorithms reported in the literature. The second algorithm works on blocks of the same size, in the number of input lines, and its key mechanism is merging the block into a single ordered list. This method achieves much more attractive space-time tradeoffs.Comment: 15 page

arXiv.org e-Print Archive

CiteSeerX

Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks

Author: Boldi Paolo
Rosa Marco
Santini Massimo
Vigna Sebastiano
Publication venue
Publication date: 01/01/2011
Field of study

We continue the line of research on graph compression started with WebGraph, but we move our focus to the compression of social networks in a proper sense (e.g., LiveJournal): the approaches that have been used for a long time to compress web graphs rely on a specific ordering of the nodes (lexicographical URL ordering) whose extension to general social networks is not trivial. In this paper, we propose a solution that mixes clusterings and orders, and devise a new algorithm, called Layered Label Propagation, that builds on previous work on scalable clustering and can be used to reorder very large graphs (billions of nodes). Our implementation uses overdecomposition to perform aggressively on multi-core architecture, making it possible to reorder graphs of more than 600 millions nodes in a few hours. Experiments performed on a wide array of web graphs and social networks show that combining the order produced by the proposed algorithm with the WebGraph compression framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks. These improvements make it possible to analyse in main memory significantly larger graphs

arXiv.org e-Print Archive

CiteSeerX

AIR Universita degli studi di Milano

PReaCH: A Fast Lightweight Reachability Index using Pruning and Contraction Hierarchies

Author: Merz Florian
Sanders Peter
Publication venue
Publication date: 01/01/2014
Field of study

We develop the data structure PReaCH (for Pruned Reachability Contraction Hierarchies) which supports reachability queries in a directed graph, i.e., it supports queries that ask whether two nodes in the graph are connected by a directed path. PReaCH adapts the contraction hierarchy speedup techniques for shortest path queries to the reachability setting. The resulting approach is surprisingly simple and guarantees linear space and near linear preprocessing time. Orthogonally to that, we improve existing pruning techniques for the search by gathering more information from a single DFS-traversal of the graph. PReaCH-indices significantly outperform previous data structures with comparable preprocessing cost. Methods with faster queries need significantly more preprocessing time in particular for the most difficult instances

arXiv.org e-Print Archive

CiteSeerX

Crossref

VoG: Summarizing and Understanding Large Graphs

Author: Faloutsos Christos
Kang U
Koutra Danai
Vreeken Jilles
Publication venue
Publication date: 01/01/2014
Field of study

How can we succinctly describe a million-node graph with a few simple sentences? How can we measure the "importance" of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a "vocabulary" of subgraph-types that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary. We measure success in a well-founded way by means of the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph. Our contributions are three-fold: (a) formulation: we provide a principled encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop \method, an efficient method to minimize the description cost, and (c) applicability: we report experimental results on multi-million-edge real graphs, including Flickr and the Notre Dame web graph.Comment: SIAM International Conference on Data Mining (SDM) 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

{VoG}: {Summarizing} and Understanding Large Graphs

Author: Faloutsos C.
Kang U.
Koutra D.
Vreeken J.
Publication venue
Publication date: 01/01/2014
Field of study

MPG.PuRe

Compressing Binary Decision Diagrams

Author: Hansen Esben Rune
Rao S. Srinivasa
Tiedemann Peter
Publication venue
Publication date: 01/01/2008
Field of study

The paper introduces a new technique for compressing Binary Decision Diagrams in those cases where random access is not required. Using this technique, compression and decompression can be done in linear time in the size of the BDD and compression will in many cases reduce the size of the BDD to 1-2 bits per node. Empirical results for our compression technique are presented, including comparisons with previously introduced techniques, showing that the new technique dominate on all tested instances.Comment: Full (tech-report) version of ECAI 2008 short pape

arXiv.org e-Print Archive

The IT University of Copenhagen's Repository