39 research outputs found

### Power of $d$ Choices with Simple Tabulation

Suppose that we are to place $m$ balls into $n$ bins sequentially using the
$d$-choice paradigm: For each ball we are given a choice of $d$ bins, according
to $d$ hash functions $h_1,\dots,h_d$ and we place the ball in the least loaded
of these bins breaking ties arbitrarily. Our interest is in the number of balls
in the fullest bin after all $m$ balls have been placed.
Azar et al. [STOC'94] proved that when $m=O(n)$ and when the hash functions
are fully random the maximum load is at most $\frac{\lg \lg n }{\lg d}+O(1)$
whp (i.e. with probability $1-O(n^{-\gamma})$ for any choice of $\gamma$).
In this paper we suppose that the $h_1,\dots,h_d$ are simple tabulation hash
functions. Generalising a result by Dahlgaard et al [SODA'16] we show that for
an arbitrary constant $d\geq 2$ the maximum load is $O(\lg \lg n)$ whp, and
that expected maximum load is at most $\frac{\lg \lg n}{\lg d}+O(1)$. We
further show that by using a simple tie-breaking algorithm introduced by
V\"ocking [J.ACM'03] the expected maximum load drops to $\frac{\lg \lg n}{d\lg
\varphi_d}+O(1)$ where $\varphi_d$ is the rate of growth of the $d$-ary
Fibonacci numbers. Both of these expected bounds match those of the fully
random setting.
The analysis by Dahlgaard et al. relies on a proof by P\u{a}tra\c{s}cu and
Thorup [J.ACM'11] concerning the use of simple tabulation for cuckoo hashing.
We need here a generalisation to $d>2$ hash functions, but the original proof
is an 8-page tour de force of ad-hoc arguments that do not appear to
generalise. Our main technical contribution is a shorter, simpler and more
accessible proof of the result by P\u{a}tra\c{s}cu and Thorup, where the
relevant parts generalise nicely to the analysis of $d$ choices.Comment: Accepted at ICALP 201

### Load Balancing with Dynamic Set of Balls and Bins

In dynamic load balancing, we wish to distribute balls into bins in an
environment where both balls and bins can be added and removed. We want to
minimize the maximum load of any bin but we also want to minimize the number of
balls and bins affected when adding or removing a ball or a bin. We want a
hashing-style solution where we given the ID of a ball can find its bin
efficiently.
We are given a balancing parameter $c=1+\epsilon$, where $\epsilon\in (0,1)$.
With $n$ and $m$ the current numbers of balls and bins, we want no bin with
load above $C=\lceil c n/m\rceil$, referred to as the capacity of the bins.
We present a scheme where we can locate a ball checking $1+O(\log
1/\epsilon)$ bins in expectation. When inserting or deleting a ball, we expect
to move $O(1/\epsilon)$ balls, and when inserting or deleting a bin, we expect
to move $O(C/\epsilon)$ balls. Previous bounds were off by a factor
$1/\epsilon$.
These bounds are best possible when $C=O(1)$ but for larger $C$, we can do
much better: Let $f=\epsilon C$ if $C\leq \log 1/\epsilon$,
$f=\epsilon\sqrt{C}\cdot \sqrt{\log(1/(\epsilon\sqrt{C}))}$ if $\log
1/\epsilon\leq C<\tfrac{1}{2\epsilon^2}$, and $C=1$ if $C\geq
\tfrac{1}{2\epsilon^2}$. We show that we expect to move $O(1/f)$ balls when
inserting or deleting a ball, and $O(C/f)$ balls when inserting or deleting a
bin.
For the bounds with larger $C$, we first have to resolve a much simpler
probabilistic problem. Place $n$ balls in $m$ bins of capacity $C$, one ball at
the time. Each ball picks a uniformly random non-full bin. We show that in
expectation and with high probability, the fraction of non-full bins is
$\Theta(f)$. Then the expected number of bins that a new ball would have to
visit to find one that is not full is $\Theta(1/f)$. As it turns out, we obtain
the same complexity in our more complicated scheme where both balls and bins
can be added and removed.Comment: Accepted at STOC'2

### One-Way Trail Orientations

Given a graph, does there exist an orientation of the edges such that the resulting directed graph is strongly connected? Robbins\u27 theorem [Robbins, Am. Math. Monthly, 1939] asserts that such an orientation exists if and only if the graph is 2-edge connected. A natural extension of this problem is the following: Suppose that the edges of the graph are partitioned into trails. Can the trails be oriented consistently such that the resulting directed graph is strongly connected?
We show that 2-edge connectivity is again a sufficient condition and we provide a linear time algorithm for finding such an orientation.
The generalised Robbins\u27 theorem [Boesch, Am. Math. Monthly, 1980] for mixed multigraphs asserts that the undirected edges of a mixed multigraph can be oriented to make the resulting directed graph strongly connected exactly when the mixed graph is strongly connected and the underlying graph is bridgeless.
We consider the natural extension where the undirected edges of a mixed multigraph are partitioned into trails. It turns out that in this case the condition of the generalised Robbin\u27s Theorem is not sufficient. However, we show that as long as each cut either contains at least 2 undirected edges or directed edges in both directions, there exists an orientation of the trails such that the resulting directed graph is strongly connected. Moreover, if the condition is satisfied, we may start by orienting an arbitrary trail in an arbitrary direction. Using this result one obtains a very simple polynomial time algorithm for finding a strong trail orientation if it exists, both in the undirected and the mixed setting

### Improved Frequency Estimation Algorithms with and without Predictions

Estimating frequencies of elements appearing in a data stream is a key task
in large-scale data analysis. Popular sketching approaches to this problem
(e.g., CountMin and CountSketch) come with worst-case guarantees that
probabilistically bound the error of the estimated frequencies for any possible
input. The work of Hsu et al. (2019) introduced the idea of using machine
learning to tailor sketching algorithms to the specific data distribution they
are being run on. In particular, their learning-augmented frequency estimation
algorithm uses a learned heavy-hitter oracle which predicts which elements will
appear many times in the stream. We give a novel algorithm, which in some
parameter regimes, already theoretically outperforms the learning based
algorithm of Hsu et al. without the use of any predictions. Augmenting our
algorithm with heavy-hitter predictions further reduces the error and improves
upon the state of the art. Empirically, our algorithms achieve superior
performance in all experiments compared to prior approaches.Comment: NeurIPS 202