5,127 research outputs found
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Flow-based Influence Graph Visual Summarization
Visually mining a large influence graph is appealing yet challenging. People
are amazed by pictures of newscasting graph on Twitter, engaged by hidden
citation networks in academics, nevertheless often troubled by the unpleasant
readability of the underlying visualization. Existing summarization methods
enhance the graph visualization with blocked views, but have adverse effect on
the latent influence structure. How can we visually summarize a large graph to
maximize influence flows? In particular, how can we illustrate the impact of an
individual node through the summarization? Can we maintain the appealing graph
metaphor while preserving both the overall influence pattern and fine
readability?
To answer these questions, we first formally define the influence graph
summarization problem. Second, we propose an end-to-end framework to solve the
new problem. Our method can not only highlight the flow-based influence
patterns in the visual summarization, but also inherently support rich graph
attributes. Last, we present a theoretic analysis and report our experiment
results. Both evidences demonstrate that our framework can effectively
approximate the proposed influence graph summarization objective while
outperforming previous methods in a typical scenario of visually mining
academic citation networks.Comment: to appear in IEEE International Conference on Data Mining (ICDM),
Shen Zhen, China, December 201
VoG: Summarizing and Understanding Large Graphs
How can we succinctly describe a million-node graph with a few simple
sentences? How can we measure the "importance" of a set of discovered subgraphs
in a large graph? These are exactly the problems we focus on. Our main ideas
are to construct a "vocabulary" of subgraph-types that often occur in real
graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the
most succinct description of a graph in terms of this vocabulary. We measure
success in a well-founded way by means of the Minimum Description Length (MDL)
principle: a subgraph is included in the summary if it decreases the total
description length of the graph.
Our contributions are three-fold: (a) formulation: we provide a principled
encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop
\method, an efficient method to minimize the description cost, and (c)
applicability: we report experimental results on multi-million-edge real
graphs, including Flickr and the Notre Dame web graph.Comment: SIAM International Conference on Data Mining (SDM) 201
{VoG}: {Summarizing} and Understanding Large Graphs
How can we succinctly describe a million-node graph with a few simple sentences? How can we measure the "importance" of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a "vocabulary" of subgraph-types that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary. We measure success in a well-founded way by means of the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph. Our contributions are three-fold: (a) formulation: we provide a principled encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop \method, an efficient method to minimize the description cost, and (c) applicability: we report experimental results on multi-million-edge real graphs, including Flickr and the Notre Dame web graph
- …