135 research outputs found
Tight and simple Web graph compression
Analysing Web graphs has applications in determining page ranks, fighting Web
spam, detecting communities and mirror sites, and more. This study is however
hampered by the necessity of storing a major part of huge graphs in the
external memory, which prevents efficient random access to edge (hyperlink)
lists. A number of algorithms involving compression techniques have thus been
presented, to represent Web graphs succinctly but also providing random access.
Those techniques are usually based on differential encodings of the adjacency
lists, finding repeating nodes or node regions in the successive lists, more
general grammar-based transformations or 2-dimensional representations of the
binary matrix of the graph. In this paper we present two Web graph compression
algorithms. The first can be seen as engineering of the Boldi and Vigna (2004)
method. We extend the notion of similarity between link lists, and use a more
compact encoding of residuals. The algorithm works on blocks of varying size
(in the number of input lines) and sacrifices access time for better
compression ratio, achieving more succinct graph representation than other
algorithms reported in the literature. The second algorithm works on blocks of
the same size, in the number of input lines, and its key mechanism is merging
the block into a single ordered list. This method achieves much more attractive
space-time tradeoffs.Comment: 15 page
Scene Graph Lossless Compression with Adaptive Prediction for Objects and Relations
The scene graph is a new data structure describing objects and their pairwise
relationship within image scenes. As the size of scene graph in vision
applications grows, how to losslessly and efficiently store such data on disks
or transmit over the network becomes an inevitable problem. However, the
compression of scene graph is seldom studied before because of the complicated
data structures and distributions. Existing solutions usually involve
general-purpose compressors or graph structure compression methods, which is
weak at reducing redundancy for scene graph data. This paper introduces a new
lossless compression framework with adaptive predictors for joint compression
of objects and relations in scene graph data. The proposed framework consists
of a unified prior extractor and specialized element predictors to adapt for
different data elements. Furthermore, to exploit the context information within
and between graph elements, Graph Context Convolution is proposed to support
different graph context modeling schemes for different graph elements. Finally,
a learned distribution model is devised to predict numerical data under
complicated conditional constraints. Experiments conducted on labeled or
generated scene graphs proves the effectiveness of the proposed framework in
scene graph lossless compression task
VoG: Summarizing and Understanding Large Graphs
How can we succinctly describe a million-node graph with a few simple
sentences? How can we measure the "importance" of a set of discovered subgraphs
in a large graph? These are exactly the problems we focus on. Our main ideas
are to construct a "vocabulary" of subgraph-types that often occur in real
graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the
most succinct description of a graph in terms of this vocabulary. We measure
success in a well-founded way by means of the Minimum Description Length (MDL)
principle: a subgraph is included in the summary if it decreases the total
description length of the graph.
Our contributions are three-fold: (a) formulation: we provide a principled
encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop
\method, an efficient method to minimize the description cost, and (c)
applicability: we report experimental results on multi-million-edge real
graphs, including Flickr and the Notre Dame web graph.Comment: SIAM International Conference on Data Mining (SDM) 201
The Limits of Popularity-Based Recommendations, and the Role of Social Ties
In this paper we introduce a mathematical model that captures some of the
salient features of recommender systems that are based on popularity and that
try to exploit social ties among the users. We show that, under very general
conditions, the market always converges to a steady state, for which we are
able to give an explicit form. Thanks to this we can tell rather precisely how
much a market is altered by a recommendation system, and determine the power of
users to influence others. Our theoretical results are complemented by
experiments with real world social networks showing that social graphs prevent
large market distortions in spite of the presence of highly influential users.Comment: 10 pages, 9 figures, KDD 201
- …