8 research outputs found

    Tight and simple Web graph compression

    Full text link
    Analysing Web graphs has applications in determining page ranks, fighting Web spam, detecting communities and mirror sites, and more. This study is however hampered by the necessity of storing a major part of huge graphs in the external memory, which prevents efficient random access to edge (hyperlink) lists. A number of algorithms involving compression techniques have thus been presented, to represent Web graphs succinctly but also providing random access. Those techniques are usually based on differential encodings of the adjacency lists, finding repeating nodes or node regions in the successive lists, more general grammar-based transformations or 2-dimensional representations of the binary matrix of the graph. In this paper we present two Web graph compression algorithms. The first can be seen as engineering of the Boldi and Vigna (2004) method. We extend the notion of similarity between link lists, and use a more compact encoding of residuals. The algorithm works on blocks of varying size (in the number of input lines) and sacrifices access time for better compression ratio, achieving more succinct graph representation than other algorithms reported in the literature. The second algorithm works on blocks of the same size, in the number of input lines, and its key mechanism is merging the block into a single ordered list. This method achieves much more attractive space-time tradeoffs.Comment: 15 page

    Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks

    Full text link
    We continue the line of research on graph compression started with WebGraph, but we move our focus to the compression of social networks in a proper sense (e.g., LiveJournal): the approaches that have been used for a long time to compress web graphs rely on a specific ordering of the nodes (lexicographical URL ordering) whose extension to general social networks is not trivial. In this paper, we propose a solution that mixes clusterings and orders, and devise a new algorithm, called Layered Label Propagation, that builds on previous work on scalable clustering and can be used to reorder very large graphs (billions of nodes). Our implementation uses overdecomposition to perform aggressively on multi-core architecture, making it possible to reorder graphs of more than 600 millions nodes in a few hours. Experiments performed on a wide array of web graphs and social networks show that combining the order produced by the proposed algorithm with the WebGraph compression framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks. These improvements make it possible to analyse in main memory significantly larger graphs

    Hierarchical Graph Generation with K2K^2-trees

    Full text link
    Generating graphs from a target distribution is a significant challenge across many domains, including drug discovery and social network analysis. In this work, we introduce a novel graph generation method leveraging K2K^2-tree representation which was originally designed for lossless graph compression. Our motivation stems from the ability of the K2K^2-trees to enable compact generation while concurrently capturing the inherent hierarchical structure of a graph. In addition, we make further contributions by (1) presenting a sequential K2K^2-tree representation that incorporates pruning, flattening, and tokenization processes and (2) introducing a Transformer-based architecture designed to generate the sequence by incorporating a specialized tree positional encoding scheme. Finally, we extensively evaluate our algorithm on four general and two molecular graph datasets to confirm its superiority for graph generation.Comment: 22 pages (10 appendices

    Compressed Indexes for String Searching in Labeled Graphs

    Get PDF
    Storing and searching large labeled graphs is indeed becoming a key issue in the design of space/time efficient online platforms indexing modern social networks or knowledge graphs. But, as far as we know, all these results are limited to design compressed graph indexes which support basic access operations onto the link structure of the input graph, such as: given a node u, return the adjacency list of u. This paper takes inspiration from the Facebook Unicorn's platform and proposes some compressed-indexing schemes for large graphs whose nodes are labeled with strings of variable length - i.e., node's attributes such as user's (nick-)name - that support sophisticated search operations which involve both the linked structure of the graph and the string content of its nodes. An extensive experimental evaluation over real social networks will show the time and space efficiency of the proposed indexing schemes and their query processing algorithms

    Permuting Web and Social Graphs

    No full text
    corecore