2,257 research outputs found

    Exploiting Computation-Friendly Graph Compression Methods for Adjacency-Matrix Multiplication

    Get PDF
    [Abstract] Computing the product of the (binary) adjacency matrix of a large graph with a real-valued vector is an important operation that lies at the heart of various graph analysis tasks, such as computing PageRank. In this paper we show that some well-known Web and social graph compression formats are computation-friendly, in the sense that they allow boosting the computation. In particular, we show that the format of Boldi and Vigna allows computing the product in time proportional to the compressed graph size. Our experimental results show speedups of at least 2 on graphs that were compressed at least 5 times with respect to the original. We show that other successful graph compression formats enjoy this property as well.Fundação para a Ciência e a Tecnologia (Portugal); UID/CEC/50021/2013Academy of Finland; 268324Fondo Nacional de Desarrollo Científico y Tecnológico (Chile); 1171058Ministerio de Economía y Competitividad; TIN2016-77158-C4-3-RXunta de Galicia; ED431C 2017/58Xunta de Galicia; ED431G/01Agencia Nacional de Investigación y Desarrollo (Chile); ICM/FIC RC13000

    More is simpler : effectively and efficiently assessing node-pair similarities based on hyperlinks

    Get PDF
    Similarity assessment is one of the core tasks in hyperlink analysis. Recently, with the proliferation of applications, e.g., web search and collaborative filtering, SimRank has been a well-studied measure of similarity between two nodes in a graph. It recursively follows the philosophy that "two nodes are similar if they are referenced (have incoming edges) from similar nodes", which can be viewed as an aggregation of similarities based on incoming paths. Despite its popularity, SimRank has an undesirable property, i.e., "zero-similarity": It only accommodates paths with equal length from a common "center" node. Thus, a large portion of other paths are fully ignored. This paper attempts to remedy this issue. (1) We propose and rigorously justify SimRank*, a revised version of SimRank, which resolves such counter-intuitive "zero-similarity" issues while inheriting merits of the basic SimRank philosophy. (2) We show that the series form of SimRank* can be reduced to a fairly succinct and elegant closed form, which looks even simpler than SimRank, yet enriches semantics without suffering from increased computational cost. This leads to a fixed-point iterative paradigm of SimRank* in O(Knm) time on a graph of n nodes and m edges for K iterations, which is comparable to SimRank. (3) To further optimize SimRank* computation, we leverage a novel clustering strategy via edge concentration. Due to its NP-hardness, we devise an efficient and effective heuristic to speed up SimRank* computation to O(Knm) time, where m is generally much smaller than m. (4) Using real and synthetic data, we empirically verify the rich semantics of SimRank*, and demonstrate its high computation efficiency

    Graph Compression for Adjacency-Matrix Multiplication

    Get PDF
    19 April 2022 A Correction to this paper has been published: https://doi.org/10.1007/s42979-022-01141-w[Abstract] Computing the product of the (binary) adjacency matrix of a large graph with a real-valued vector is an important operation that lies at the heart of various graph analysis tasks, such as computing PageRank. In this paper, we show that some well-known webgraph and social graph compression formats are computation-friendly, in the sense that they allow boosting the computation. We focus on the compressed representations of (a) Boldi and Vigna and (b) Hernández and Navarro, and show that the product computation can be conducted in time proportional to the compressed graph size. Our experimental results show speedups of at least 2 on graphs that were compressed at least 5 times with respect to the original.We thank Cecilia Hernández for providing us with her software extracting the bicliques, and a helpful description in how to run it. This research has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie [grant agreement No 690941], namely while the first author was visiting the University of Chile, and while the second author was affiliated with the University of Helsinki and visiting the University of A Coruña. The first author was funded by Fundação para a Ciência e a Tecnologia (FCT) [grant number UIDB/50021/2020 and PTDC/CCI-BIO/29676/2017]; the second author was funded by the Academy of Finland [Grant number 268324], Fondecyt [Grant number 1171058] and NSERC [Grant number RGPIN-07185-2020]; the third author was funded by JSPS KAKENHI [grant numbers JP21K17701 and JP21H05847]; the fourth author was funded by AEI and Ministerio de Ciencia e Innovación (PGE and FEDER) [grant number PID2019-105221RB-C41] and Xunta de Galicia (co-funded with FEDER) [Grant numbers ED431C 2021/53 and ED431G 2019/01]; and the fifth author was funded by ANID – Millennium Science Initiative Program – Code ICN17_002Xunta de Galicia; ED431C 2021/53Xunta de Galicia; ED431G 2019/0

    A Neighborhood-preserving Graph Summarization

    Full text link
    We introduce in this paper a new summarization method for large graphs. Our summarization approach retains only a user-specified proportion of the neighbors of each node in the graph. Our main aim is to simplify large graphs so that they can be analyzed and processed effectively while preserving as many of the node neighborhood properties as possible. Since many graph algorithms are based on the neighborhood information available for each node, the idea is to produce a smaller graph which can be used to allow these algorithms to handle large graphs and run faster while providing good approximations. Moreover, our compression allows users to control the size of the compressed graph by adjusting the amount of information loss that can be tolerated. The experiments conducted on various real and synthetic graphs show that our compression reduces considerably the size of the graphs. Moreover, we conducted several experiments on the obtained summaries using various graph algorithms and applications, such as node embedding, graph classification and shortest path approximations. The obtained results show interesting trade-offs between the algorithms runtime speed-up and the precision loss.Comment: 17 pages, 10 figure

    Opportunistic linked data querying through approximate membership metadata

    Get PDF
    Between URI dereferencing and the SPARQL protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute SPARQL queries against low-cost servers, at the cost of higher bandwidth. Increasing a client's efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical SPARQL query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing HTTP requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer HTTP requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface
    corecore