76 research outputs found

### Pairwise Independent Random Walks Can Be Slightly Unbounded

A family of problems that have been studied in the context of various streaming algorithms are generalizations of the fact that the expected maximum distance of a 4-wise independent random walk on a line over n steps is O(sqrt{n}). For small values of k, there exist k-wise independent random walks that can be stored in much less space than storing n random bits, so these properties are often useful for lowering space bounds. In this paper, we show that for all of these examples, 4-wise independence is required by demonstrating a pairwise independent random walk with steps uniform in +/- 1 and expected maximum distance Omega(sqrt{n} lg n) from the origin. We also show that this bound is tight for the first and second moment, i.e. the expected maximum square distance of a 2-wise independent random walk is always O(n lg^2 n). Also, for any even k >= 4, we show that the kth moment of the maximum distance of any k-wise independent random walk is O(n^{k/2}). The previous two results generalize to random walks tracking insertion-only streams, and provide higher moment bounds than currently known. We also prove a generalization of Kolmogorov\u27s maximal inequality by showing an asymptotically equivalent statement that requires only 4-wise independent random variables with bounded second moments, which also generalizes a result of Blasiok

### Better and Simpler Lower Bounds for Differentially Private Statistical Estimation

We provide improved lower bounds for two well-known high-dimensional private
estimation tasks. First, we prove that for estimating the covariance of a
Gaussian up to spectral error $\alpha$ with approximate differential privacy,
one needs $\tilde{\Omega}\left(\frac{d^{3/2}}{\alpha \varepsilon} +
\frac{d}{\alpha^2}\right)$ samples for any $\alpha \le O(1)$, which is tight up
to logarithmic factors. This improves over previous work which established this
for $\alpha \le O\left(\frac{1}{\sqrt{d}}\right)$, and is also simpler than
previous work. Next, we prove that for estimating the mean of a heavy-tailed
distribution with bounded $k$th moments with approximate differential privacy,
one needs $\tilde{\Omega}\left(\frac{d}{\alpha^{k/(k-1)} \varepsilon} +
\frac{d}{\alpha^2}\right)$ samples. This matches known upper bounds and
improves over the best known lower bound for this problem, which only hold for
pure differential privacy, or when $k = 2$. Our techniques follow the method of
fingerprinting and are generally quite simple. Our lower bound for heavy-tailed
estimation is based on a black-box reduction from privately estimating
identity-covariance Gaussians. Our lower bound for covariance estimation
utilizes a Bayesian approach to show that, under an Inverse Wishart prior
distribution for the covariance matrix, no private estimator can be accurate
even in expectation, without sufficiently many samples.Comment: 23 page

### Optimal Time-Backlog Tradeoffs for the Variable-Processor Cup Game

The \emph{$p$-processor cup game} is a classic and widely studied scheduling
problem that captures the setting in which a $p$-processor machine must assign
tasks to processors over time in order to ensure that no individual task ever
falls too far behind. The problem is formalized as a multi-round game in which
two players, a filler (who assigns work to tasks) and an emptier (who schedules
tasks) compete. The emptier's goal is to minimize backlog, which is the maximum
amount of outstanding work for any task.
Recently, Kuszmaul and Westover (ITCS, 2021) proposed the
\emph{variable-processor cup game}, which considers the same problem, except
that the amount of resources available to the players (i.e., the number $p$ of
processors) fluctuates between rounds of the game. They showed that this
seemingly small modification fundamentally changes the dynamics of the game:
whereas the optimal backlog in the fixed $p$-processor game is $\Theta(\log
n)$, independent of $p$, the optimal backlog in the variable-processor game is
$\Theta(n)$. The latter result was only known to apply to games with
\emph{exponentially many} rounds, however, and it has remained an open question
what the optimal tradeoff between time and backlog is for shorter games.
This paper establishes a tight trade-off curve between time and backlog in
the variable-processor cup game. Importantly, we prove that for a game
consisting of $t$ rounds, the optimal backlog is $\Theta(n)$ if and only if $t
\ge \Omega(n^3)$. Our techniques also allow for us to resolve several other
open questions concerning how the variable-processor cup game behaves in
beyond-worst-case-analysis settings.Comment: 40 pages, published in International Conference on Automata,
Languages, and Programming (ICALP), 2022. Abstract abridged for arXiv
submission: see paper for full abstract. Updated to acknowledge additional
fundin

### A faster and simpler algorithm for learning shallow networks

We revisit the well-studied problem of learning a linear combination of $k$
ReLU activations given labeled examples drawn from the standard $d$-dimensional
Gaussian measure. Chen et al. [CDG+23] recently gave the first algorithm for
this problem to run in $\text{poly}(d,1/\varepsilon)$ time when $k = O(1)$,
where $\varepsilon$ is the target error. More precisely, their algorithm runs
in time $(d/\varepsilon)^{\mathrm{quasipoly}(k)}$ and learns over multiple
stages. Here we show that a much simpler one-stage version of their algorithm
suffices, and moreover its runtime is only $(d/\varepsilon)^{O(k^2)}$.Comment: 14 page

### Improved Diversity Maximization Algorithms for Matching and Pseudoforest

In this work we consider the diversity maximization problem, where given a
data set $X$ of $n$ elements, and a parameter $k$, the goal is to pick a subset
of $X$ of size $k$ maximizing a certain diversity measure. [CH01] defined a
variety of diversity measures based on pairwise distances between the points. A
constant factor approximation algorithm was known for all those diversity
measures except ``remote-matching'', where only an $O(\log k)$ approximation
was known. In this work we present an $O(1)$ approximation for this remaining
notion. Further, we consider these notions from the perpective of composable
coresets. [IMMM14] provided composable coresets with a constant factor
approximation for all but ``remote-pseudoforest'' and ``remote-matching'',
which again they only obtained a $O(\log k)$ approximation. Here we also close
the gap up to constants and present a constant factor composable coreset
algorithm for these two notions. For remote-matching, our coreset has size only
$O(k)$, and for remote-pseudoforest, our coreset has size
$O(k^{1+\varepsilon})$ for any $\varepsilon > 0$, for an
$O(1/\varepsilon)$-approximate coreset.Comment: 27 pages, 1 table. Accepted to APPROX, 202

### Circular Trace Reconstruction

Trace reconstruction is the problem of learning an unknown string $x$ from
independent traces of $x$, where traces are generated by independently deleting
each bit of $x$ with some deletion probability $q$. In this paper, we initiate
the study of Circular trace reconstruction, where the unknown string $x$ is
circular and traces are now rotated by a random cyclic shift. Trace
reconstruction is related to many computational biology problems studying DNA,
which is a primary motivation for this problem as well, as many types of DNA
are known to be circular.
Our main results are as follows. First, we prove that we can reconstruct
arbitrary circular strings of length $n$ using
$\exp\big(\tilde{O}(n^{1/3})\big)$ traces for any constant deletion probability
$q$, as long as $n$ is prime or the product of two primes. For $n$ of this
form, this nearly matches what was the best known bound of
$\exp\big(O(n^{1/3})\big)$ for standard trace reconstruction when this paper
was initially released. We note, however, that Chase very recently improved the
standard trace reconstruction bound to $\exp\big(\tilde{O}(n^{1/5})\big)$.
Next, we prove that we can reconstruct random circular strings with high
probability using $n^{O(1)}$ traces for any constant deletion probability $q$.
Finally, we prove a lower bound of $\tilde{\Omega}(n^3)$ traces for arbitrary
circular strings, which is greater than the best known lower bound of
$\tilde{\Omega}(n^{3/2})$ in standard trace reconstruction.Comment: 25 pages, 1 figure. To appear in Innovations in Theoretical Computer
Science (ITCS), 202

- …