13 research outputs found

    When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Processing

    Full text link
    Carefully balancing load in distributed stream processing systems has a fundamental impact on execution latency and throughput. Load balancing is challenging because real-world workloads are skewed: some tuples in the stream are associated to keys which are significantly more frequent than others. Skew is remarkably more problematic in large deployments: more workers implies fewer keys per worker, so it becomes harder to "average out" the cost of hot keys with cold keys. We propose a novel load balancing technique that uses a heaving hitter algorithm to efficiently identify the hottest keys in the stream. These hot keys are assigned to d≄2d \geq 2 choices to ensure a balanced load, where dd is tuned automatically to minimize the memory and computation cost of operator replication. The technique works online and does not require the use of routing tables. Our extensive evaluation shows that our technique can balance real-world workloads on large deployments, and improve throughput and latency by 150%\mathbf{150\%} and 60%\mathbf{60\%} respectively over the previous state-of-the-art when deployed on Apache Storm.Comment: 12 pages, 14 Figures, this paper is accepted and will be published at ICDE 201

    Parallel Load Balancing on Constrained Client-Server Topologies

    Get PDF
    We study parallel \emph{Load Balancing} protocols for a client-server distributed model defined as follows. There is a set \sC of nn clients and a set \sS of nn servers where each client has (at most) a constant number d≄1d \geq 1 of requests that must be assigned to some server. The client set and the server one are connected to each other via a fixed bipartite graph: the requests of client vv can only be sent to the servers in its neighborhood N(v)N(v). The goal is to assign every client request so as to minimize the maximum load of the servers. In this setting, efficient parallel protocols are available only for dense topolgies. In particular, a simple symmetric, non-adaptive protocol achieving constant maximum load has been recently introduced by Becchetti et al \cite{BCNPT18} for regular dense bipartite graphs. The parallel completion time is \bigO(\log n) and the overall work is \bigO(n), w.h.p. Motivated by proximity constraints arising in some client-server systems, we devise a simple variant of Becchetti et al's protocol \cite{BCNPT18} and we analyse it over almost-regular bipartite graphs where nodes may have neighborhoods of small size. In detail, we prove that, w.h.p., this new version has a cost equivalent to that of Becchetti et al's protocol (in terms of maximum load, completion time, and work complexity, respectively) on every almost-regular bipartite graph with degree Ω(log⁥2n)\Omega(\log^2n). Our analysis significantly departs from that in \cite{BCNPT18} for the original protocol and requires to cope with non-trivial stochastic-dependence issues on the random choices of the algorithmic process which are due to the worst-case, sparse topology of the underlying graph

    Parallel Load Balancing on constrained client-server topologies

    Get PDF
    We study parallel Load Balancing protocols for the client-server distributed model defined as follows. There is a set of n clients and a set of n servers where each client has (at most) a constant number of requests that must be assigned to some server. The client set and the server one are connected to each other via a fixed bipartite graph: the requests of client v can only be sent to the servers in its neighborhood. The goal is to assign every client request so as to minimize the maximum load of the servers. In this setting, efficient parallel protocols are available only for dense topologies. In particular, a simple protocol, named raes, has been recently introduced by Becchetti et al. [1] for regular dense bipartite graphs. They show that this symmetric, non-adaptive protocol achieves constant maximum load with parallel completion time and overall work, w.h.p. Motivated by proximity constraints arising in some client-server systems, we analyze raes over almost-regular bipartite graphs where nodes may have neighborhoods of small size. In detail, we prove that, w.h.p., the raes protocol keeps the same performances as above (in terms of maximum load, completion time, and work complexity, respectively) on any almost-regular bipartite graph with degree. Our analysis significantly departs from that in [1] since it requires to cope with non-trivial stochastic-dependence issues on the random choices of the algorithmic process which are due to the worst-case, sparse topology of the underlying graph

    Tight bounds for parallel randomized load balancing

    Get PDF
    Given a distributed system of n balls and n bins, how evenly can we distribute the balls to the bins, minimizing communication? The fastest non-adaptive and symmetric algorithm achieving a constant maximum bin load requires Θ(loglogn) rounds, and any such algorithm running for r∈O(1) rounds incurs a bin load of Ω((logn/loglogn)1/r). In this work, we explore the fundamental limits of the general problem. We present a simple adaptive symmetric algorithm that achieves a bin load of 2 in log∗n+O(1) communication rounds using O(n) messages in total. Our main result, however, is a matching lower bound of (1−o(1))log∗n on the time complexity of symmetric algorithms that guarantee small bin loads. The essential preconditions of the proof are (i) a limit of O(n) on the total number of messages sent by the algorithm and (ii) anonymity of bins, i.e., the port numberings of balls need not be globally consistent. In order to show that our technique yields indeed tight bounds, we provide for each assumption an algorithm violating it, in turn achieving a constant maximum bin load in constant time.German Research Foundation (DFG, reference number Le 3107/1-1)Society of Swiss Friends of the Weizmann Institute of ScienceSwiss National Fun

    The Power of Filling in Balanced Allocations

    Get PDF
    It is well known that if mm balls (jobs) are placed sequentially into nn bins (servers) according to the One-Choice protocol −- choose a single bin in each round and allocate one ball to it −- then, for m≫nm \gg n, the gap between the maximum and average load diverges. Many refinements of the One-Choice protocol have been studied that achieve a gap that remains bounded by a function of nn, for any mm. However most of these variations, such as Two-Choice, are less sample-efficient than One-Choice, in the sense that for each allocated ball more than one sample is needed (in expectation). We introduce a new class of processes which are primarily characterized by "filling" underloaded bins. A prototypical example is the Packing process: At each round we only take one bin sample, if the load is below the average load, then we place as many balls until the average load is reached; otherwise, we place only one ball. We prove that for any process in this class the gap between the maximum and average load is O(log⁥n)\mathcal{O}(\log n) for any number of balls mm. For the Packing process, we also prove a matching lower bound. We also prove that the Packing process is more sample-efficient than One-Choice, that is, it allocates on average more than one ball per sample. Finally, we also demonstrate that the upper bound of O(log⁥n)\mathcal{O}(\log n) on the gap can be extended to the Caching process (a.k.a. memory protocol) studied by Mitzenmacher, Prabhakar and Shah (2002).Comment: This paper refines and extends the content on filling processes in arXiv:2110.10759. It consists of 31 pages, 6 figures, 2 table

    Scalable Multi-Party Private Set-Intersection

    Get PDF
    In this work we study the problem of private set-intersection in the multi-party setting and design two protocols with the following improvements compared to prior work. First, our protocols are designed in the so-called star network topology, where a designated party communicates with everyone else, and take a new approach of leveraging the 2PC protocol of [FreedmanNP04]. This approach minimizes the usage of a broadcast channel, where our semi-honest protocol does not make any use of such a channel and all communication is via point-to-point channels. In addition, the communication complexity of our protocols scales with the number of parties. More concretely, (1) our first semi-honest secure protocol implies communication complexity that is linear in the input sizes, namely O((∑i=1nmi)⋅Îș)O((\sum_{i=1}^n m_i)\cdot\kappa) bits of communication where Îș\kappa is the security parameter and mim_i is the size of PiP_i\u27s input set, whereas overall computational overhead is quadratic in the input sizes only for a designated party, and linear for the rest. We further reduce this overhead by employing two types of hashing schemes. (2) Our second protocol is proven secure in the malicious setting. This protocol induces communication complexity O((n^2 + nm_\maxx + nm_\minn\log m_\maxx)\kappa) bits of communication where m_\minn (resp. m_\maxx) is the minimum (resp. maximum) over all input sets sizes and nn is the number of parties

    Mean-Biased Processes for Balanced Allocations

    Full text link
    We introduce a new class of balanced allocation processes which bias towards underloaded bins (those with load below the mean load) either by skewing the probability by which a bin is chosen for an allocation (probability bias), or alternatively, by adding more balls to an underloaded bin (weight bias). A prototypical process satisfying the probability bias condition is Mean-Thinning: At each round, we sample one bin and if it is underloaded, we allocate one ball; otherwise, we allocate one ball to a second bin sample. Versions of this process have been in use since at least 1986. An example of a process, introduced by us, which satisfies the weight bias condition is Twinning: At each round, we only sample one bin. If the bin is underloaded, then we allocate two balls; otherwise, we allocate only one ball. Our main result is that for any process with a probability or weight bias, with high probability the gap between maximum and minimum load is logarithmic in the number of bins. This result holds for any number of allocated balls (heavily loaded case), covers many natural processes that relax the Two-Choice process, and we also prove it is tight for many such processes, including Mean-Thinning and Twinning. Our analysis employs a delicate interplay between linear, quadratic and exponential potential functions. It also hinges on a phenomenon we call "mean quantile stabilization", which holds in greater generality than our framework and may be of independent interest.Comment: This paper refines and extends the content on non-filling processes in arXiv:2110.10759. It consists of 65 pages, 7 figures, 2 table
    corecore