121 research outputs found
VoG: Summarizing and Understanding Large Graphs
How can we succinctly describe a million-node graph with a few simple
sentences? How can we measure the "importance" of a set of discovered subgraphs
in a large graph? These are exactly the problems we focus on. Our main ideas
are to construct a "vocabulary" of subgraph-types that often occur in real
graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the
most succinct description of a graph in terms of this vocabulary. We measure
success in a well-founded way by means of the Minimum Description Length (MDL)
principle: a subgraph is included in the summary if it decreases the total
description length of the graph.
Our contributions are three-fold: (a) formulation: we provide a principled
encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop
\method, an efficient method to minimize the description cost, and (c)
applicability: we report experimental results on multi-million-edge real
graphs, including Flickr and the Notre Dame web graph.Comment: SIAM International Conference on Data Mining (SDM) 201
{VoG}: {Summarizing} and Understanding Large Graphs
How can we succinctly describe a million-node graph with a few simple sentences? How can we measure the "importance" of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a "vocabulary" of subgraph-types that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary. We measure success in a well-founded way by means of the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph. Our contributions are three-fold: (a) formulation: we provide a principled encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop \method, an efficient method to minimize the description cost, and (c) applicability: we report experimental results on multi-million-edge real graphs, including Flickr and the Notre Dame web graph
DEMON: a Local-First Discovery Method for Overlapping Communities
Community discovery in complex networks is an interesting problem with a
number of applications, especially in the knowledge extraction task in social
and information networks. However, many large networks often lack a particular
community organization at a global level. In these cases, traditional graph
partitioning algorithms fail to let the latent knowledge embedded in modular
structure emerge, because they impose a top-down global view of a network. We
propose here a simple local-first approach to community discovery, able to
unveil the modular organization of real complex networks. This is achieved by
democratically letting each node vote for the communities it sees surrounding
it in its limited view of the global system, i.e. its ego neighborhood, using a
label propagation algorithm; finally, the local communities are merged into a
global collection. We tested this intuition against the state-of-the-art
overlapping and non-overlapping community discovery methods, and found that our
new method clearly outperforms the others in the quality of the obtained
communities, evaluated by using the extracted communities to predict the
metadata about the nodes of several real world networks. We also show how our
method is deterministic, fully incremental, and has a limited time complexity,
so that it can be used on web-scale real networks.Comment: 9 pages; Proceedings of the 18th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, Beijing, China, August 12-16, 201
Compression-based inference of network motif sets
Physical and functional constraints on biological networks lead to complex
topological patterns across multiple scales in their organization. A particular
type of higher-order network feature that has received considerable interest is
network motifs, defined as statistically regular subgraphs. These may implement
fundamental logical and computational circuits and are referred as ``building
blocks of complex networks''. Their well-defined structures and small sizes
also enables the testing of their functions in synthetic and natural biological
experiments. The statistical inference of network motifs is however fraught
with difficulties, from defining and sampling the right null model to
accounting for the large number of possible motifs and their potential
correlations in statistical testing. Here we develop a framework for motif
mining based on lossless network compression using subgraph contractions. The
minimum description length principle allows us to select the most significant
set of motifs as well as other prominent network features in terms of their
combined compression of the network. The approach inherently accounts for
multiple testing and correlations between subgraphs and does not rely on a
priori specification of an appropriate null model. This provides an alternative
definition of motif significance which guarantees more robust statistical
inference. Our approach overcomes the common problems in classic testing-based
motif analysis. We apply our methodology to perform comparative connectomics by
evaluating the compressibility and the circuit motifs of a range of
synaptic-resolution neural connectomes
Temporal Networks
A great variety of systems in nature, society and technology -- from the web
of sexual contacts to the Internet, from the nervous system to power grids --
can be modeled as graphs of vertices coupled by edges. The network structure,
describing how the graph is wired, helps us understand, predict and optimize
the behavior of dynamical systems. In many cases, however, the edges are not
continuously active. As an example, in networks of communication via email,
text messages, or phone calls, edges represent sequences of instantaneous or
practically instantaneous contacts. In some cases, edges are active for
non-negligible periods of time: e.g., the proximity patterns of inpatients at
hospitals can be represented by a graph where an edge between two individuals
is on throughout the time they are at the same ward. Like network topology, the
temporal structure of edge activations can affect dynamics of systems
interacting through the network, from disease contagion on the network of
patients to information diffusion over an e-mail network. In this review, we
present the emergent field of temporal networks, and discuss methods for
analyzing topological and temporal structure and models for elucidating their
relation to the behavior of dynamical systems. In the light of traditional
network theory, one can see this framework as moving the information of when
things happen from the dynamical system on the network, to the network itself.
Since fundamental properties, such as the transitivity of edges, do not
necessarily hold in temporal networks, many of these methods need to be quite
different from those for static networks
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra
We propose GraphMineSuite (GMS): the first benchmarking suite for graph
mining that facilitates evaluating and constructing high-performance graph
mining algorithms. First, GMS comes with a benchmark specification based on
extensive literature review, prescribing representative problems, algorithms,
and datasets. Second, GMS offers a carefully designed software platform for
seamless testing of different fine-grained elements of graph mining algorithms,
such as graph representations or algorithm subroutines. The platform includes
parallel implementations of more than 40 considered baselines, and it
facilitates developing complex and fast mining algorithms. High modularity is
possible by harnessing set algebra operations such as set intersection and
difference, which enables breaking complex graph mining algorithms into simple
building blocks that can be separately experimented with. GMS is supported with
a broad concurrency analysis for portability in performance insights, and a
novel performance metric to assess the throughput of graph mining algorithms,
enabling more insightful evaluation. As use cases, we harness GMS to rapidly
redesign and accelerate state-of-the-art baselines of core graph mining
problems: degeneracy reordering (by up to >2x), maximal clique listing (by up
to >9x), k-clique listing (by 1.1x), and subgraph isomorphism (by up to 2.5x),
also obtaining better theoretical performance bounds
- …