47,552 research outputs found

    Locally Optimal Load Balancing

    Full text link
    This work studies distributed algorithms for locally optimal load-balancing: We are given a graph of maximum degree Δ\Delta, and each node has up to LL units of load. The task is to distribute the load more evenly so that the loads of adjacent nodes differ by at most 11. If the graph is a path (Δ=2\Delta = 2), it is easy to solve the fractional version of the problem in O(L)O(L) communication rounds, independently of the number of nodes. We show that this is tight, and we show that it is possible to solve also the discrete version of the problem in O(L)O(L) rounds in paths. For the general case (Δ>2\Delta > 2), we show that fractional load balancing can be solved in poly⁡(L,Δ)\operatorname{poly}(L,\Delta) rounds and discrete load balancing in f(L,Δ)f(L,\Delta) rounds for some function ff, independently of the number of nodes.Comment: 19 pages, 11 figure

    Gunrock: A High-Performance Graph Processing Library on the GPU

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five key graph primitives and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the previous version v5

    Improved Analysis of Deterministic Load-Balancing Schemes

    Full text link
    We consider the problem of deterministic load balancing of tokens in the discrete model. A set of nn processors is connected into a dd-regular undirected network. In every time step, each processor exchanges some of its tokens with each of its neighbors in the network. The goal is to minimize the discrepancy between the number of tokens on the most-loaded and the least-loaded processor as quickly as possible. Rabani et al. (1998) present a general technique for the analysis of a wide class of discrete load balancing algorithms. Their approach is to characterize the deviation between the actual loads of a discrete balancing algorithm with the distribution generated by a related Markov chain. The Markov chain can also be regarded as the underlying model of a continuous diffusion algorithm. Rabani et al. showed that after time T=O(log⁥(Kn)/ÎŒ)T = O(\log (Kn)/\mu), any algorithm of their class achieves a discrepancy of O(dlog⁥n/ÎŒ)O(d\log n/\mu), where ÎŒ\mu is the spectral gap of the transition matrix of the graph, and KK is the initial load discrepancy in the system. In this work we identify some natural additional conditions on deterministic balancing algorithms, resulting in a class of algorithms reaching a smaller discrepancy. This class contains well-known algorithms, eg., the Rotor-Router. Specifically, we introduce the notion of cumulatively fair load-balancing algorithms where in any interval of consecutive time steps, the total number of tokens sent out over an edge by a node is the same (up to constants) for all adjacent edges. We prove that algorithms which are cumulatively fair and where every node retains a sufficient part of its load in each step, achieve a discrepancy of O(min⁥{dlog⁥n/ÎŒ,dn})O(\min\{d\sqrt{\log n/\mu},d\sqrt{n}\}) in time O(T)O(T). We also show that in general neither of these assumptions may be omitted without increasing discrepancy. We then show by a combinatorial potential reduction argument that any cumulatively fair scheme satisfying some additional assumptions achieves a discrepancy of O(d)O(d) almost as quickly as the continuous diffusion process. This positive result applies to some of the simplest and most natural discrete load balancing schemes.Comment: minor corrections; updated literature overvie

    Almost spanning subgraphs of random graphs after adversarial edge removal

    Full text link
    Let Delta>1 be a fixed integer. We show that the random graph G(n,p) with p>>(log n/n)^{1/Delta} is robust with respect to the containment of almost spanning bipartite graphs H with maximum degree Delta and sublinear bandwidth in the following sense: asymptotically almost surely, if an adversary deletes arbitrary edges in G(n,p) such that each vertex loses less than half of its neighbours, then the resulting graph still contains a copy of all such H.Comment: 46 pages, 6 figure

    Gunrock: GPU Graph Analytics

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

    Approximate Hamilton decompositions of robustly expanding regular digraphs

    Get PDF
    We show that every sufficiently large r-regular digraph G which has linear degree and is a robust outexpander has an approximate decomposition into edge-disjoint Hamilton cycles, i.e. G contains a set of r-o(r) edge-disjoint Hamilton cycles. Here G is a robust outexpander if for every set S which is not too small and not too large, the `robust' outneighbourhood of S is a little larger than S. This generalises a result of K\"uhn, Osthus and Treglown on approximate Hamilton decompositions of dense regular oriented graphs. It also generalises a result of Frieze and Krivelevich on approximate Hamilton decompositions of quasirandom (di)graphs. In turn, our result is used as a tool by K\"uhn and Osthus to prove that any sufficiently large r-regular digraph G which has linear degree and is a robust outexpander even has a Hamilton decomposition.Comment: Final version, published in SIAM Journal Discrete Mathematics. 44 pages, 2 figure

    Scalable Peer-to-Peer Indexing with Constant State

    Full text link
    We present a distributed indexing scheme for peer to peer networks. Past work on distributed indexing traded off fast search times with non-constant degree topologies or network-unfriendly behavior such as flooding. In contrast, the scheme we present optimizes all three of these performance measures. That is, we provide logarithmic round searches while maintaining connections to a fixed number of peers and avoiding network flooding. In comparison to the well known scheme Chord, we provide competitive constant factors. Finally, we observe that arbitrary linear speedups are possible and discuss both a general brute force approach and specific economical optimizations

    A Statistical Mechanical Load Balancer for the Web

    Full text link
    The maximum entropy principle from statistical mechanics states that a closed system attains an equilibrium distribution that maximizes its entropy. We first show that for graphs with fixed number of edges one can define a stochastic edge dynamic that can serve as an effective thermalization scheme, and hence, the underlying graphs are expected to attain their maximum-entropy states, which turn out to be Erdos-Renyi (ER) random graphs. We next show that (i) a rate-equation based analysis of node degree distribution does indeed confirm the maximum-entropy principle, and (ii) the edge dynamic can be effectively implemented using short random walks on the underlying graphs, leading to a local algorithm for the generation of ER random graphs. The resulting statistical mechanical system can be adapted to provide a distributed and local (i.e., without any centralized monitoring) mechanism for load balancing, which can have a significant impact in increasing the efficiency and utilization of both the Internet (e.g., efficient web mirroring), and large-scale computing infrastructure (e.g., cluster and grid computing).Comment: 11 Pages, 5 Postscript figures; added references, expanded on protocol discussio
    • 

    corecore