167,153 research outputs found
Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs
We study the problem of approximating the -profile of a large graph.
-profiles are generalizations of triangle counts that specify the number of
times a small graph appears as an induced subgraph of a large graph. Our
algorithm uses the novel concept of -profile sparsifiers: sparse graphs that
can be used to approximate the full -profile counts for a given large graph.
Further, we study the problem of estimating local and ego -profiles, two
graph quantities that characterize the local neighborhood of each vertex of a
graph.
Our algorithm is distributed and operates as a vertex program over the
GraphLab PowerGraph framework. We introduce the concept of edge pivoting which
allows us to collect -hop information without maintaining an explicit
-hop neighborhood list at each vertex. This enables the computation of all
the local -profiles in parallel with minimal communication.
We test out implementation in several experiments scaling up to cores
on Amazon EC2. We find that our algorithm can estimate the -profile of a
graph in approximately the same time as triangle counting. For the harder
problem of ego -profiles, we introduce an algorithm that can estimate
profiles of hundreds of thousands of vertices in parallel, in the timescale of
minutes.Comment: To appear in part at KDD'1
Distributed Estimation of Graph 4-Profiles
We present a novel distributed algorithm for counting all four-node induced
subgraphs in a big graph. These counts, called the -profile, describe a
graph's connectivity properties and have found several uses ranging from
bioinformatics to spam detection. We also study the more complicated problem of
estimating the local -profiles centered at each vertex of the graph. The
local -profile embeds every vertex in an -dimensional space that
characterizes the local geometry of its neighborhood: vertices that connect
different clusters will have different local -profiles compared to those
that are only part of one dense cluster.
Our algorithm is a local, distributed message-passing scheme on the graph and
computes all the local -profiles in parallel. We rely on two novel
theoretical contributions: we show that local -profiles can be calculated
using compressed two-hop information and also establish novel concentration
results that show that graphs can be substantially sparsified and still retain
good approximation quality for the global -profile.
We empirically evaluate our algorithm using a distributed GraphLab
implementation that we scaled up to cores. We show that our algorithm can
compute global and local -profiles of graphs with millions of edges in a few
minutes, significantly improving upon the previous state of the art.Comment: To appear in part at WWW'1
Notes on Randomized Algorithms
Lecture notes for the Yale Computer Science course CPSC 469/569 Randomized
Algorithms. Suitable for use as a supplementary text for an introductory
graduate or advanced undergraduate course on randomized algorithms. Discusses
tools from probability theory, including random variables and expectations,
union bound arguments, concentration bounds, applications of martingales and
Markov chains, and the Lov\'asz Local Lemma. Algorithmic topics include
analysis of classic randomized algorithms such as Quicksort and Hoare's FIND,
randomized tree data structures, hashing, Markov chain Monte Carlo sampling,
randomized approximate counting, derandomization, quantum computing, and some
examples of randomized distributed algorithms
FLEET: Butterfly Estimation from a Bipartite Graph Stream
We consider space-efficient single-pass estimation of the number of
butterflies, a fundamental bipartite graph motif, from a massive bipartite
graph stream where each edge represents a connection between entities in two
different partitions. We present a space lower bound for any streaming
algorithm that can estimate the number of butterflies accurately, as well as
FLEET, a suite of algorithms for accurately estimating the number of
butterflies in the graph stream. Estimates returned by the algorithms come with
provable guarantees on the approximation error, and experiments show good
tradeoffs between the space used and the accuracy of approximation. We also
present space-efficient algorithms for estimating the number of butterflies
within a sliding window of the most recent elements in the stream. While there
is a significant body of work on counting subgraphs such as triangles in a
unipartite graph stream, our work seems to be one of the few to tackle the case
of bipartite graph streams.Comment: This is the author's version of the work. It is posted here by
permission of ACM for your personal use. Not for redistribution. The
definitive version was published in Seyed-Vahid Sanei-Mehri, Yu Zhang, Ahmet
Erdem Sariyuce and Srikanta Tirthapura. "FLEET: Butterfly Estimation from a
Bipartite Graph Stream". The 28th ACM International Conference on Information
and Knowledge Managemen
- …