17 research outputs found
Graph compression using heuristic-based reordering
Inverted index has been extensively used in Information retrieval systems for document relatedqueries. We consider the generic case of graph storage using Inverted Index and compressing themusing this format. Graph compression by reordering has been done using traversal and clusteringbased techniques. In generic methods, the graph is reordered to arrive at new identifiers for thevertices. The reordered graph is then encoded using an encoding format. The reordering to achievemaximal compression is a well known NP complete problem, the Optimal Linear Arrangement.
Our work focuses on the inverted index format, where each node has its corresponding list ofneighbours. We propose a heuristic based graph reordering, using the property that the cost ofeach vertex is bound by its neighbour with largest vertex id. Consider, two vertices x and y withedges a and b respectively. If x\u3ey and a\u3eb, then cost of graph would come down, if the vertex idof x and y are interchanged. Further, experiments shows that using this heuristic helps in achievingcompression rates on par with distributed methods but with reduced utilization of computationresource
Direction Matters : On Influence-Preserving Graph Summarization and Max-Cut Principle for Directed Graphs
Summarizing large-scale directed graphs into small-scale representations is a useful but less-studied problem setting. Conventional clustering approaches, based on Min-Cut-style criteria, compress both the vertices and edges of the graph into the communities, which lead to a loss of directed edge information. On the other hand, compressing the vertices while preserving the directed-edge information provides a way to learn the small-scale representation of a directed graph. The reconstruction error, which measures the edge information preserved by the summarized graph, can be used to learn such representation. Compared to the original graphs, the summarized graphs are easier to analyze and are capable of extracting group-level features, useful for efficient interventions of population behavior. In this letter, we present a model, based on minimizing reconstruction error with nonnegative constraints, which relates to a Max-Cut criterion that simultaneously identifies the compressed nodes and the directed compressed relations between these nodes. A multiplicative update algorithm with column-wise normalization is proposed. We further provide theoretical results on the identifiability of the model and the convergence of the proposed algorithms. Experiments are conducted to demonstrate the accuracy and robustness of the proposed method.Peer reviewe
Approximating the Minimum Logarithmic Arrangement Problem
We study a graph reordering problem motivated by compressing massive graphs such as social networks and inverted indexes. Given a graph, G = (V, E), the Minimum Logarithmic Arrangement problem is to find a permutation, ?, of the vertices that minimizes ?_{(u, v) ? E} (1 + ? lg |?(u) - ?(v)| ?).
This objective has been shown to be a good measure of how many bits are needed to encode the graph if the adjacency list of each vertex is encoded using relative positions of two consecutive neighbors under the ? order in the list rather than using absolute indices or node identifiers, which requires at least lg n bits per edge.
We show the first non-trivial approximation factor for this problem by giving a polynomial time ?(log k)-approximation algorithm for graphs with treewidth k
Incremental Lossless Graph Summarization
Given a fully dynamic graph, represented as a stream of edge insertions and
deletions, how can we obtain and incrementally update a lossless summary of its
current snapshot? As large-scale graphs are prevalent, concisely representing
them is inevitable for efficient storage and analysis. Lossless graph
summarization is an effective graph-compression technique with many desirable
properties. It aims to compactly represent the input graph as (a) a summary
graph consisting of supernodes (i.e., sets of nodes) and superedges (i.e.,
edges between supernodes), which provide a rough description, and (b) edge
corrections which fix errors induced by the rough description. While a number
of batch algorithms, suited for static graphs, have been developed for rapid
and compact graph summarization, they are highly inefficient in terms of time
and space for dynamic graphs, which are common in practice. In this work, we
propose MoSSo, the first incremental algorithm for lossless summarization of
fully dynamic graphs. In response to each change in the input graph, MoSSo
updates the output representation by repeatedly moving nodes among supernodes.
MoSSo decides nodes to be moved and their destinations carefully but rapidly
based on several novel ideas. Through extensive experiments on 10 real graphs,
we show MoSSo is (a) Fast and 'any time': processing each change in
near-constant time (less than 0.1 millisecond), up to 7 orders of magnitude
faster than running state-of-the-art batch methods, (b) Scalable: summarizing
graphs with hundreds of millions of edges, requiring sub-linear memory during
the process, and (c) Effective: achieving comparable compression ratios even to
state-of-the-art batch methods.Comment: to appear at the 26th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD '20