47,552 research outputs found
Locally Optimal Load Balancing
This work studies distributed algorithms for locally optimal load-balancing:
We are given a graph of maximum degree , and each node has up to
units of load. The task is to distribute the load more evenly so that the loads
of adjacent nodes differ by at most .
If the graph is a path (), it is easy to solve the fractional
version of the problem in communication rounds, independently of the
number of nodes. We show that this is tight, and we show that it is possible to
solve also the discrete version of the problem in rounds in paths.
For the general case (), we show that fractional load balancing
can be solved in rounds and discrete load
balancing in rounds for some function , independently of the
number of nodes.Comment: 19 pages, 11 figure
Gunrock: A High-Performance Graph Processing Library on the GPU
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs have been two
significant challenges for developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We evaluate Gunrock on five key graph
primitives and show that Gunrock has on average at least an order of magnitude
speedup over Boost and PowerGraph, comparable performance to the fastest GPU
hardwired primitives, and better performance than any other GPU high-level
graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the
previous version v5
Improved Analysis of Deterministic Load-Balancing Schemes
We consider the problem of deterministic load balancing of tokens in the
discrete model. A set of processors is connected into a -regular
undirected network. In every time step, each processor exchanges some of its
tokens with each of its neighbors in the network. The goal is to minimize the
discrepancy between the number of tokens on the most-loaded and the
least-loaded processor as quickly as possible.
Rabani et al. (1998) present a general technique for the analysis of a wide
class of discrete load balancing algorithms. Their approach is to characterize
the deviation between the actual loads of a discrete balancing algorithm with
the distribution generated by a related Markov chain. The Markov chain can also
be regarded as the underlying model of a continuous diffusion algorithm. Rabani
et al. showed that after time , any algorithm of their
class achieves a discrepancy of , where is the spectral
gap of the transition matrix of the graph, and is the initial load
discrepancy in the system.
In this work we identify some natural additional conditions on deterministic
balancing algorithms, resulting in a class of algorithms reaching a smaller
discrepancy. This class contains well-known algorithms, eg., the Rotor-Router.
Specifically, we introduce the notion of cumulatively fair load-balancing
algorithms where in any interval of consecutive time steps, the total number of
tokens sent out over an edge by a node is the same (up to constants) for all
adjacent edges. We prove that algorithms which are cumulatively fair and where
every node retains a sufficient part of its load in each step, achieve a
discrepancy of in time . We
also show that in general neither of these assumptions may be omitted without
increasing discrepancy. We then show by a combinatorial potential reduction
argument that any cumulatively fair scheme satisfying some additional
assumptions achieves a discrepancy of almost as quickly as the
continuous diffusion process. This positive result applies to some of the
simplest and most natural discrete load balancing schemes.Comment: minor corrections; updated literature overvie
Almost spanning subgraphs of random graphs after adversarial edge removal
Let Delta>1 be a fixed integer. We show that the random graph G(n,p) with
p>>(log n/n)^{1/Delta} is robust with respect to the containment of almost
spanning bipartite graphs H with maximum degree Delta and sublinear bandwidth
in the following sense: asymptotically almost surely, if an adversary deletes
arbitrary edges in G(n,p) such that each vertex loses less than half of its
neighbours, then the resulting graph still contains a copy of all such H.Comment: 46 pages, 6 figure
Gunrock: GPU Graph Analytics
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs, have presented two
significant challenges to developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We characterize the performance of
various optimization strategies and evaluate Gunrock's overall performance on
different GPU architectures on a wide range of graph primitives that span from
traversal-based algorithms and ranking algorithms, to triangle counting and
bipartite-graph-based algorithms. The results show that on a single GPU,
Gunrock has on average at least an order of magnitude speedup over Boost and
PowerGraph, comparable performance to the fastest GPU hardwired primitives and
CPU shared-memory graph libraries such as Ligra and Galois, and better
performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing
(TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance
Graph Processing Library on the GPU
Approximate Hamilton decompositions of robustly expanding regular digraphs
We show that every sufficiently large r-regular digraph G which has linear
degree and is a robust outexpander has an approximate decomposition into
edge-disjoint Hamilton cycles, i.e. G contains a set of r-o(r) edge-disjoint
Hamilton cycles. Here G is a robust outexpander if for every set S which is not
too small and not too large, the `robust' outneighbourhood of S is a little
larger than S. This generalises a result of K\"uhn, Osthus and Treglown on
approximate Hamilton decompositions of dense regular oriented graphs. It also
generalises a result of Frieze and Krivelevich on approximate Hamilton
decompositions of quasirandom (di)graphs. In turn, our result is used as a tool
by K\"uhn and Osthus to prove that any sufficiently large r-regular digraph G
which has linear degree and is a robust outexpander even has a Hamilton
decomposition.Comment: Final version, published in SIAM Journal Discrete Mathematics. 44
pages, 2 figure
Scalable Peer-to-Peer Indexing with Constant State
We present a distributed indexing scheme for peer to peer networks. Past work on distributed indexing traded off fast search times with non-constant degree topologies or network-unfriendly behavior such as flooding. In contrast, the scheme we present optimizes all three of these performance measures. That is, we provide logarithmic round searches while maintaining connections to a fixed number of peers and avoiding network flooding. In comparison to the well known scheme Chord, we provide competitive constant factors. Finally, we observe that arbitrary linear speedups are possible and discuss both a general brute force approach and specific economical optimizations
A Statistical Mechanical Load Balancer for the Web
The maximum entropy principle from statistical mechanics states that a closed
system attains an equilibrium distribution that maximizes its entropy. We first
show that for graphs with fixed number of edges one can define a stochastic
edge dynamic that can serve as an effective thermalization scheme, and hence,
the underlying graphs are expected to attain their maximum-entropy states,
which turn out to be Erdos-Renyi (ER) random graphs. We next show that (i) a
rate-equation based analysis of node degree distribution does indeed confirm
the maximum-entropy principle, and (ii) the edge dynamic can be effectively
implemented using short random walks on the underlying graphs, leading to a
local algorithm for the generation of ER random graphs. The resulting
statistical mechanical system can be adapted to provide a distributed and local
(i.e., without any centralized monitoring) mechanism for load balancing, which
can have a significant impact in increasing the efficiency and utilization of
both the Internet (e.g., efficient web mirroring), and large-scale computing
infrastructure (e.g., cluster and grid computing).Comment: 11 Pages, 5 Postscript figures; added references, expanded on
protocol discussio
- âŠ