39 research outputs found

    Power of d Choices with Simple Tabulation

    Get PDF

    Power of dd Choices with Simple Tabulation

    Get PDF
    Suppose that we are to place mm balls into nn bins sequentially using the dd-choice paradigm: For each ball we are given a choice of dd bins, according to dd hash functions h1,,hdh_1,\dots,h_d and we place the ball in the least loaded of these bins breaking ties arbitrarily. Our interest is in the number of balls in the fullest bin after all mm balls have been placed. Azar et al. [STOC'94] proved that when m=O(n)m=O(n) and when the hash functions are fully random the maximum load is at most lglgnlgd+O(1)\frac{\lg \lg n }{\lg d}+O(1) whp (i.e. with probability 1O(nγ)1-O(n^{-\gamma}) for any choice of γ\gamma). In this paper we suppose that the h1,,hdh_1,\dots,h_d are simple tabulation hash functions. Generalising a result by Dahlgaard et al [SODA'16] we show that for an arbitrary constant d2d\geq 2 the maximum load is O(lglgn)O(\lg \lg n) whp, and that expected maximum load is at most lglgnlgd+O(1)\frac{\lg \lg n}{\lg d}+O(1). We further show that by using a simple tie-breaking algorithm introduced by V\"ocking [J.ACM'03] the expected maximum load drops to lglgndlgφd+O(1)\frac{\lg \lg n}{d\lg \varphi_d}+O(1) where φd\varphi_d is the rate of growth of the dd-ary Fibonacci numbers. Both of these expected bounds match those of the fully random setting. The analysis by Dahlgaard et al. relies on a proof by P\u{a}tra\c{s}cu and Thorup [J.ACM'11] concerning the use of simple tabulation for cuckoo hashing. We need here a generalisation to d>2d>2 hash functions, but the original proof is an 8-page tour de force of ad-hoc arguments that do not appear to generalise. Our main technical contribution is a shorter, simpler and more accessible proof of the result by P\u{a}tra\c{s}cu and Thorup, where the relevant parts generalise nicely to the analysis of dd choices.Comment: Accepted at ICALP 201

    Classifying Convex Bodies by Their Contact and Intersection Graphs

    Get PDF

    Load Balancing with Dynamic Set of Balls and Bins

    Full text link
    In dynamic load balancing, we wish to distribute balls into bins in an environment where both balls and bins can be added and removed. We want to minimize the maximum load of any bin but we also want to minimize the number of balls and bins affected when adding or removing a ball or a bin. We want a hashing-style solution where we given the ID of a ball can find its bin efficiently. We are given a balancing parameter c=1+ϵc=1+\epsilon, where ϵ(0,1)\epsilon\in (0,1). With nn and mm the current numbers of balls and bins, we want no bin with load above C=cn/mC=\lceil c n/m\rceil, referred to as the capacity of the bins. We present a scheme where we can locate a ball checking 1+O(log1/ϵ)1+O(\log 1/\epsilon) bins in expectation. When inserting or deleting a ball, we expect to move O(1/ϵ)O(1/\epsilon) balls, and when inserting or deleting a bin, we expect to move O(C/ϵ)O(C/\epsilon) balls. Previous bounds were off by a factor 1/ϵ1/\epsilon. These bounds are best possible when C=O(1)C=O(1) but for larger CC, we can do much better: Let f=ϵCf=\epsilon C if Clog1/ϵC\leq \log 1/\epsilon, f=ϵClog(1/(ϵC))f=\epsilon\sqrt{C}\cdot \sqrt{\log(1/(\epsilon\sqrt{C}))} if log1/ϵC<12ϵ2\log 1/\epsilon\leq C<\tfrac{1}{2\epsilon^2}, and C=1C=1 if C12ϵ2C\geq \tfrac{1}{2\epsilon^2}. We show that we expect to move O(1/f)O(1/f) balls when inserting or deleting a ball, and O(C/f)O(C/f) balls when inserting or deleting a bin. For the bounds with larger CC, we first have to resolve a much simpler probabilistic problem. Place nn balls in mm bins of capacity CC, one ball at the time. Each ball picks a uniformly random non-full bin. We show that in expectation and with high probability, the fraction of non-full bins is Θ(f)\Theta(f). Then the expected number of bins that a new ball would have to visit to find one that is not full is Θ(1/f)\Theta(1/f). As it turns out, we obtain the same complexity in our more complicated scheme where both balls and bins can be added and removed.Comment: Accepted at STOC'2

    One-Way Trail Orientations

    Get PDF
    Given a graph, does there exist an orientation of the edges such that the resulting directed graph is strongly connected? Robbins\u27 theorem [Robbins, Am. Math. Monthly, 1939] asserts that such an orientation exists if and only if the graph is 2-edge connected. A natural extension of this problem is the following: Suppose that the edges of the graph are partitioned into trails. Can the trails be oriented consistently such that the resulting directed graph is strongly connected? We show that 2-edge connectivity is again a sufficient condition and we provide a linear time algorithm for finding such an orientation. The generalised Robbins\u27 theorem [Boesch, Am. Math. Monthly, 1980] for mixed multigraphs asserts that the undirected edges of a mixed multigraph can be oriented to make the resulting directed graph strongly connected exactly when the mixed graph is strongly connected and the underlying graph is bridgeless. We consider the natural extension where the undirected edges of a mixed multigraph are partitioned into trails. It turns out that in this case the condition of the generalised Robbin\u27s Theorem is not sufficient. However, we show that as long as each cut either contains at least 2 undirected edges or directed edges in both directions, there exists an orientation of the trails such that the resulting directed graph is strongly connected. Moreover, if the condition is satisfied, we may start by orienting an arbitrary trail in an arbitrary direction. Using this result one obtains a very simple polynomial time algorithm for finding a strong trail orientation if it exists, both in the undirected and the mixed setting

    Improved Frequency Estimation Algorithms with and without Predictions

    Full text link
    Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically bound the error of the estimated frequencies for any possible input. The work of Hsu et al. (2019) introduced the idea of using machine learning to tailor sketching algorithms to the specific data distribution they are being run on. In particular, their learning-augmented frequency estimation algorithm uses a learned heavy-hitter oracle which predicts which elements will appear many times in the stream. We give a novel algorithm, which in some parameter regimes, already theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions. Augmenting our algorithm with heavy-hitter predictions further reduces the error and improves upon the state of the art. Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches.Comment: NeurIPS 202
    corecore