5 research outputs found

    Tight and simple Web graph compression

    Full text link
    Analysing Web graphs has applications in determining page ranks, fighting Web spam, detecting communities and mirror sites, and more. This study is however hampered by the necessity of storing a major part of huge graphs in the external memory, which prevents efficient random access to edge (hyperlink) lists. A number of algorithms involving compression techniques have thus been presented, to represent Web graphs succinctly but also providing random access. Those techniques are usually based on differential encodings of the adjacency lists, finding repeating nodes or node regions in the successive lists, more general grammar-based transformations or 2-dimensional representations of the binary matrix of the graph. In this paper we present two Web graph compression algorithms. The first can be seen as engineering of the Boldi and Vigna (2004) method. We extend the notion of similarity between link lists, and use a more compact encoding of residuals. The algorithm works on blocks of varying size (in the number of input lines) and sacrifices access time for better compression ratio, achieving more succinct graph representation than other algorithms reported in the literature. The second algorithm works on blocks of the same size, in the number of input lines, and its key mechanism is merging the block into a single ordered list. This method achieves much more attractive space-time tradeoffs.Comment: 15 page

    Graph Compression for Adjacency-Matrix Multiplication

    Get PDF
    19 April 2022 A Correction to this paper has been published: https://doi.org/10.1007/s42979-022-01141-w[Abstract] Computing the product of the (binary) adjacency matrix of a large graph with a real-valued vector is an important operation that lies at the heart of various graph analysis tasks, such as computing PageRank. In this paper, we show that some well-known webgraph and social graph compression formats are computation-friendly, in the sense that they allow boosting the computation. We focus on the compressed representations of (a) Boldi and Vigna and (b) Hernández and Navarro, and show that the product computation can be conducted in time proportional to the compressed graph size. Our experimental results show speedups of at least 2 on graphs that were compressed at least 5 times with respect to the original.We thank Cecilia Hernández for providing us with her software extracting the bicliques, and a helpful description in how to run it. This research has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie [grant agreement No 690941], namely while the first author was visiting the University of Chile, and while the second author was affiliated with the University of Helsinki and visiting the University of A Coruña. The first author was funded by Fundação para a Ciência e a Tecnologia (FCT) [grant number UIDB/50021/2020 and PTDC/CCI-BIO/29676/2017]; the second author was funded by the Academy of Finland [Grant number 268324], Fondecyt [Grant number 1171058] and NSERC [Grant number RGPIN-07185-2020]; the third author was funded by JSPS KAKENHI [grant numbers JP21K17701 and JP21H05847]; the fourth author was funded by AEI and Ministerio de Ciencia e Innovación (PGE and FEDER) [grant number PID2019-105221RB-C41] and Xunta de Galicia (co-funded with FEDER) [Grant numbers ED431C 2021/53 and ED431G 2019/01]; and the fifth author was funded by ANID – Millennium Science Initiative Program – Code ICN17_002Xunta de Galicia; ED431C 2021/53Xunta de Galicia; ED431G 2019/0
    corecore