8 research outputs found
Tight and simple Web graph compression
Analysing Web graphs has applications in determining page ranks, fighting Web
spam, detecting communities and mirror sites, and more. This study is however
hampered by the necessity of storing a major part of huge graphs in the
external memory, which prevents efficient random access to edge (hyperlink)
lists. A number of algorithms involving compression techniques have thus been
presented, to represent Web graphs succinctly but also providing random access.
Those techniques are usually based on differential encodings of the adjacency
lists, finding repeating nodes or node regions in the successive lists, more
general grammar-based transformations or 2-dimensional representations of the
binary matrix of the graph. In this paper we present two Web graph compression
algorithms. The first can be seen as engineering of the Boldi and Vigna (2004)
method. We extend the notion of similarity between link lists, and use a more
compact encoding of residuals. The algorithm works on blocks of varying size
(in the number of input lines) and sacrifices access time for better
compression ratio, achieving more succinct graph representation than other
algorithms reported in the literature. The second algorithm works on blocks of
the same size, in the number of input lines, and its key mechanism is merging
the block into a single ordered list. This method achieves much more attractive
space-time tradeoffs.Comment: 15 page
Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks
We continue the line of research on graph compression started with WebGraph,
but we move our focus to the compression of social networks in a proper sense
(e.g., LiveJournal): the approaches that have been used for a long time to
compress web graphs rely on a specific ordering of the nodes (lexicographical
URL ordering) whose extension to general social networks is not trivial. In
this paper, we propose a solution that mixes clusterings and orders, and devise
a new algorithm, called Layered Label Propagation, that builds on previous work
on scalable clustering and can be used to reorder very large graphs (billions
of nodes). Our implementation uses overdecomposition to perform aggressively on
multi-core architecture, making it possible to reorder graphs of more than 600
millions nodes in a few hours. Experiments performed on a wide array of web
graphs and social networks show that combining the order produced by the
proposed algorithm with the WebGraph compression framework provides a major
increase in compression with respect to all currently known techniques, both on
web graphs and on social networks. These improvements make it possible to
analyse in main memory significantly larger graphs
Hierarchical Graph Generation with -trees
Generating graphs from a target distribution is a significant challenge
across many domains, including drug discovery and social network analysis. In
this work, we introduce a novel graph generation method leveraging -tree
representation which was originally designed for lossless graph compression.
Our motivation stems from the ability of the -trees to enable compact
generation while concurrently capturing the inherent hierarchical structure of
a graph. In addition, we make further contributions by (1) presenting a
sequential -tree representation that incorporates pruning, flattening, and
tokenization processes and (2) introducing a Transformer-based architecture
designed to generate the sequence by incorporating a specialized tree
positional encoding scheme. Finally, we extensively evaluate our algorithm on
four general and two molecular graph datasets to confirm its superiority for
graph generation.Comment: 22 pages (10 appendices
Compressed Indexes for String Searching in Labeled Graphs
Storing and searching large labeled graphs is indeed becoming a key issue in the design of space/time efficient online platforms indexing modern social networks or knowledge graphs. But, as far as we know, all these results are limited to design compressed graph indexes which support basic access operations onto the link structure of the input graph, such as: given a node u, return the adjacency list of u. This paper takes inspiration from the Facebook Unicorn's platform and proposes some compressed-indexing schemes for large graphs whose nodes are labeled with strings of variable length - i.e., node's attributes such as user's (nick-)name - that support sophisticated search operations which involve both the linked structure of the graph and the string content of its nodes.
An extensive experimental evaluation over real social networks will show the time and space efficiency of the proposed indexing schemes and their query processing algorithms