314 research outputs found
Distributed Computing on Core-Periphery Networks: Axiom-based Design
Inspired by social networks and complex systems, we propose a core-periphery
network architecture that supports fast computation for many distributed
algorithms and is robust and efficient in number of links. Rather than
providing a concrete network model, we take an axiom-based design approach. We
provide three intuitive (and independent) algorithmic axioms and prove that any
network that satisfies all axioms enjoys an efficient algorithm for a range of
tasks (e.g., MST, sparse matrix multiplication, etc.). We also show the
minimality of our axiom set: for networks that satisfy any subset of the
axioms, the same efficiency cannot be guaranteed for any deterministic
algorithm
FrogWild! -- Fast PageRank Approximations on Graph Engines
We propose FrogWild, a novel algorithm for fast approximation of high
PageRank vertices, geared towards reducing network costs of running traditional
PageRank algorithms. Our algorithm can be seen as a quantized version of power
iteration that performs multiple parallel random walks over a directed graph.
One important innovation is that we introduce a modification to the GraphLab
framework that only partially synchronizes mirror vertices. This partial
synchronization vastly reduces the network traffic generated by traditional
PageRank algorithms, thus greatly reducing the per-iteration cost of PageRank.
On the other hand, this partial synchronization also creates dependencies
between the random walks used to estimate PageRank. Our main theoretical
innovation is the analysis of the correlations introduced by this partial
synchronization process and a bound establishing that our approximation is
close to the true PageRank vector.
We implement our algorithm in GraphLab and compare it against the default
PageRank implementation. We show that our algorithm is very fast, performing
each iteration in less than one second on the Twitter graph and can be up to 7x
faster compared to the standard GraphLab PageRank implementation
Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs
We study the problem of approximating the -profile of a large graph.
-profiles are generalizations of triangle counts that specify the number of
times a small graph appears as an induced subgraph of a large graph. Our
algorithm uses the novel concept of -profile sparsifiers: sparse graphs that
can be used to approximate the full -profile counts for a given large graph.
Further, we study the problem of estimating local and ego -profiles, two
graph quantities that characterize the local neighborhood of each vertex of a
graph.
Our algorithm is distributed and operates as a vertex program over the
GraphLab PowerGraph framework. We introduce the concept of edge pivoting which
allows us to collect -hop information without maintaining an explicit
-hop neighborhood list at each vertex. This enables the computation of all
the local -profiles in parallel with minimal communication.
We test out implementation in several experiments scaling up to cores
on Amazon EC2. We find that our algorithm can estimate the -profile of a
graph in approximately the same time as triangle counting. For the harder
problem of ego -profiles, we introduce an algorithm that can estimate
profiles of hundreds of thousands of vertices in parallel, in the timescale of
minutes.Comment: To appear in part at KDD'1
Distributed Estimation of Graph 4-Profiles
We present a novel distributed algorithm for counting all four-node induced
subgraphs in a big graph. These counts, called the -profile, describe a
graph's connectivity properties and have found several uses ranging from
bioinformatics to spam detection. We also study the more complicated problem of
estimating the local -profiles centered at each vertex of the graph. The
local -profile embeds every vertex in an -dimensional space that
characterizes the local geometry of its neighborhood: vertices that connect
different clusters will have different local -profiles compared to those
that are only part of one dense cluster.
Our algorithm is a local, distributed message-passing scheme on the graph and
computes all the local -profiles in parallel. We rely on two novel
theoretical contributions: we show that local -profiles can be calculated
using compressed two-hop information and also establish novel concentration
results that show that graphs can be substantially sparsified and still retain
good approximation quality for the global -profile.
We empirically evaluate our algorithm using a distributed GraphLab
implementation that we scaled up to cores. We show that our algorithm can
compute global and local -profiles of graphs with millions of edges in a few
minutes, significantly improving upon the previous state of the art.Comment: To appear in part at WWW'1
Generalized Perron--Frobenius Theorem for Nonsquare Matrices
The celebrated Perron--Frobenius (PF) theorem is stated for irreducible
nonnegative square matrices, and provides a simple characterization of their
eigenvectors and eigenvalues. The importance of this theorem stems from the
fact that eigenvalue problems on such matrices arise in many fields of science
and engineering, including dynamical systems theory, economics, statistics and
optimization. However, many real-life scenarios give rise to nonsquare
matrices. A natural question is whether the PF Theorem (along with its
applications) can be generalized to a nonsquare setting. Our paper provides a
generalization of the PF Theorem to nonsquare matrices. The extension can be
interpreted as representing client-server systems with additional degrees of
freedom, where each client may choose between multiple servers that can
cooperate in serving it (while potentially interfering with other clients).
This formulation is motivated by applications to power control in wireless
networks, economics and others, all of which extend known examples for the use
of the original PF Theorem.
We show that the option of cooperation between servers does not improve the
situation, in the sense that in the optimal solution no cooperation is needed,
and only one server needs to serve each client. Hence, the additional power of
having several potential servers per client translates into \emph{choosing} the
best single server and not into \emph{sharing} the load between the servers in
some way, as one might have expected.
The two main contributions of the paper are (i) a generalized PF Theorem that
characterizes the optimal solution for a non-convex nonsquare problem, and (ii)
an algorithm for finding the optimal solution in polynomial time
Efficient Joint Network-Source Coding for Multiple Terminals with Side Information
Consider the problem of source coding in networks with multiple receiving
terminals, each having access to some kind of side information. In this case,
standard coding techniques are either prohibitively complex to decode, or
require network-source coding separation, resulting in sub-optimal transmission
rates. To alleviate this problem, we offer a joint network-source coding scheme
based on matrix sparsification at the code design phase, which allows the
terminals to use an efficient decoding procedure (syndrome decoding using
LDPC), despite the network coding throughout the network. Via a novel relation
between matrix sparsification and rate-distortion theory, we give lower and
upper bounds on the best achievable sparsification performance. These bounds
allow us to analyze our scheme, and, in particular, show that in the limit
where all receivers have comparable side information (in terms of conditional
entropy), or, equivalently, have weak side information, a vanishing density can
be achieved. As a result, efficient decoding is possible at all terminals
simultaneously. Simulation results motivate the use of this scheme at
non-limiting rates as well
- …