17 research outputs found

    Graph compression using heuristic-based reordering

    Get PDF
    Inverted index has been extensively used in Information retrieval systems for document relatedqueries. We consider the generic case of graph storage using Inverted Index and compressing themusing this format. Graph compression by reordering has been done using traversal and clusteringbased techniques. In generic methods, the graph is reordered to arrive at new identifiers for thevertices. The reordered graph is then encoded using an encoding format. The reordering to achievemaximal compression is a well known NP complete problem, the Optimal Linear Arrangement. Our work focuses on the inverted index format, where each node has its corresponding list ofneighbours. We propose a heuristic based graph reordering, using the property that the cost ofeach vertex is bound by its neighbour with largest vertex id. Consider, two vertices x and y withedges a and b respectively. If x\u3ey and a\u3eb, then cost of graph would come down, if the vertex idof x and y are interchanged. Further, experiments shows that using this heuristic helps in achievingcompression rates on par with distributed methods but with reduced utilization of computationresource

    Direction Matters : On Influence-Preserving Graph Summarization and Max-Cut Principle for Directed Graphs

    Get PDF
    Summarizing large-scale directed graphs into small-scale representations is a useful but less-studied problem setting. Conventional clustering approaches, based on Min-Cut-style criteria, compress both the vertices and edges of the graph into the communities, which lead to a loss of directed edge information. On the other hand, compressing the vertices while preserving the directed-edge information provides a way to learn the small-scale representation of a directed graph. The reconstruction error, which measures the edge information preserved by the summarized graph, can be used to learn such representation. Compared to the original graphs, the summarized graphs are easier to analyze and are capable of extracting group-level features, useful for efficient interventions of population behavior. In this letter, we present a model, based on minimizing reconstruction error with nonnegative constraints, which relates to a Max-Cut criterion that simultaneously identifies the compressed nodes and the directed compressed relations between these nodes. A multiplicative update algorithm with column-wise normalization is proposed. We further provide theoretical results on the identifiability of the model and the convergence of the proposed algorithms. Experiments are conducted to demonstrate the accuracy and robustness of the proposed method.Peer reviewe

    Approximating the Minimum Logarithmic Arrangement Problem

    Get PDF
    We study a graph reordering problem motivated by compressing massive graphs such as social networks and inverted indexes. Given a graph, G = (V, E), the Minimum Logarithmic Arrangement problem is to find a permutation, ?, of the vertices that minimizes ?_{(u, v) ? E} (1 + ? lg |?(u) - ?(v)| ?). This objective has been shown to be a good measure of how many bits are needed to encode the graph if the adjacency list of each vertex is encoded using relative positions of two consecutive neighbors under the ? order in the list rather than using absolute indices or node identifiers, which requires at least lg n bits per edge. We show the first non-trivial approximation factor for this problem by giving a polynomial time ?(log k)-approximation algorithm for graphs with treewidth k

    Incremental Lossless Graph Summarization

    Full text link
    Given a fully dynamic graph, represented as a stream of edge insertions and deletions, how can we obtain and incrementally update a lossless summary of its current snapshot? As large-scale graphs are prevalent, concisely representing them is inevitable for efficient storage and analysis. Lossless graph summarization is an effective graph-compression technique with many desirable properties. It aims to compactly represent the input graph as (a) a summary graph consisting of supernodes (i.e., sets of nodes) and superedges (i.e., edges between supernodes), which provide a rough description, and (b) edge corrections which fix errors induced by the rough description. While a number of batch algorithms, suited for static graphs, have been developed for rapid and compact graph summarization, they are highly inefficient in terms of time and space for dynamic graphs, which are common in practice. In this work, we propose MoSSo, the first incremental algorithm for lossless summarization of fully dynamic graphs. In response to each change in the input graph, MoSSo updates the output representation by repeatedly moving nodes among supernodes. MoSSo decides nodes to be moved and their destinations carefully but rapidly based on several novel ideas. Through extensive experiments on 10 real graphs, we show MoSSo is (a) Fast and 'any time': processing each change in near-constant time (less than 0.1 millisecond), up to 7 orders of magnitude faster than running state-of-the-art batch methods, (b) Scalable: summarizing graphs with hundreds of millions of edges, requiring sub-linear memory during the process, and (c) Effective: achieving comparable compression ratios even to state-of-the-art batch methods.Comment: to appear at the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '20
    corecore